ollama

Author	SHA1	Message	Date
Daniel Hiltgen	d4c10df2b0	Add Radeon gfx940-942 GPU support	2024-03-15 15:34:58 -07:00
Blake Mizerany	6ce37e4d96	llm,readline: use errors.Is instead of simple == check (#3161 ) This fixes some brittle, simple equality checks to use errors.Is. Since go1.13, errors.Is is the idiomatic way to check for errors. Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-03-15 07:14:12 -07:00
Michael Yang	291c663865	fix: clip memory leak	2024-03-14 13:12:42 -07:00
Jeffrey Morgan	e72c567cfd	restore locale patch (#3091 )	2024-03-12 22:08:13 -07:00
Bruce MacDonald	3e22611200	token repeat limit for prediction requests (#3080 )	2024-03-12 22:08:25 -04:00
Bruce MacDonald	2f804068bd	warn when json format is expected but not mentioned in prompt (#3081 )	2024-03-12 19:07:11 -04:00
racerole	53c107e20e	chore: fix typo (#3073 ) Signed-off-by: racerole <jiangyifeng@outlook.com>	2024-03-12 14:09:22 -04:00
Bruce MacDonald	b80661e8c7	relay load model errors to the client (#3065 )	2024-03-11 16:48:27 -04:00
Jeffrey Morgan	369eda65f5	update llama.cpp submodule to `ceca1ae` (#3064 )	2024-03-11 12:57:48 -07:00
Daniel Hiltgen	bc13da2bfe	Avoid rocm runner and dependency clash Putting the rocm symlink next to the runners is risky. This moves the payloads into a subdir to avoid potential clashes.	2024-03-11 09:33:22 -07:00
Jeffrey Morgan	41b00b9856	fix `03-locale.diff`	2024-03-10 16:21:05 -07:00
Daniel Hiltgen	3dc1bb6a35	Harden for deps file being empty (or short)	2024-03-10 14:45:38 -07:00
Jeffrey Morgan	908005d90b	patch: use default locale in wpm tokenizer (#3034 )	2024-03-09 21:12:12 -08:00
Jeffrey Morgan	e11668aa07	add `bundle_metal` and `cleanup_metal` funtions to `gen_darwin.sh`	2024-03-09 16:04:57 -08:00
Jeffrey Morgan	1ffb1e2874	update llama.cpp submodule to `77d1ac7` (#3030 )	2024-03-09 15:55:34 -08:00
Jeffrey Morgan	f9cd55c70b	disable gpu for certain model architectures and fix divide-by-zero on memory estimation	2024-03-09 12:51:38 -08:00
Daniel Hiltgen	4a5c9b8035	Finish unwinding idempotent payload logic The recent ROCm change partially removed idempotent payloads, but the ggml-metal.metal file for mac was still idempotent. This finishes switching to always extract the payloads, and now that idempotentcy is gone, the version directory is no longer useful.	2024-03-09 08:34:39 -08:00
Jeffrey Morgan	efe5617b64	update llama.cpp submodule to `c2101a2` (#3020 )	2024-03-09 00:44:50 -08:00
Michael Yang	76bdebbadf	decode ggla	2024-03-08 15:46:25 -08:00
Jeffrey Morgan	0e4669b04f	update llama.cpp submodule to `6cdabe6` (#2999 )	2024-03-08 00:26:20 -08:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
John	23ebe8fe11	fix some typos (#2973 ) Signed-off-by: hishope <csqiye@126.com>	2024-03-06 22:50:11 -08:00
Patrick Devine	2c017ca441	Convert Safetensors to an Ollama model (#2824 )	2024-03-06 21:01:51 -08:00
Jeffrey Morgan	21347e1ed6	update llama.cpp submodule to `c29af7e` (#2868 )	2024-03-01 15:26:04 -08:00
Daniel Hiltgen	bd1d8b0d14	Merge pull request #2836 from bmwiedemann/gzip Omit build date from gzip headers	2024-02-29 15:46:46 -08:00
Jeffrey Morgan	cbf4970e0f	bump submodule to `87c91c07663b707e831c59ec373b5e665ff9d64a` (#2828 )	2024-02-29 09:42:08 -08:00
Bernhard M. Wiedemann	76e5d9ec88	Omit build date from gzip headers See https://reproducible-builds.org/ for why this is good. This patch was done while working on reproducible builds for openSUSE.	2024-02-29 16:48:19 +01:00
Daniel Hiltgen	061e8f6abc	Bump llama.cpp to b2276	2024-02-26 16:49:24 -08:00
Jeffrey Morgan	11bfff8ee1	update llama.cpp submodule to `96633eeca1265ed03e57230de54032041c58f9cd`	2024-02-22 16:44:26 -05:00
Jeffrey Morgan	efe040f8c0	reset with `init_vars` ahead of each cpu build in `gen_windows.ps1` (#2654 )	2024-02-21 16:35:34 -05:00
Jeffrey Morgan	2a7553ce09	update llama.cpp submodule to `c14f72d`	2024-02-21 09:03:14 -05:00
Jeffrey Morgan	b3eac61cac	update llama.cpp submodule to `f0d1fafc029a056cd765bdae58dcaa12312e9879`	2024-02-20 22:56:51 -05:00
Michael Yang	949d7b1c48	add gguf file types (#2532 )	2024-02-20 19:06:29 -05:00
Jeffrey Morgan	4613a080e7	update llama.cpp submodule to `66c1968f7` (#2618 )	2024-02-20 17:42:31 -05:00
Taras Tsugrii	01ff2e14db	[nit] Remove unused msg local var. (#2511 )	2024-02-20 14:02:34 -05:00
Daniel Hiltgen	4fcbf1cde6	Merge pull request #2599 from dhiltgen/fix_avx Explicitly disable AVX2 on GPU builds	2024-02-19 13:13:05 -08:00
Daniel Hiltgen	9220b4fa91	Merge pull request #2585 from dhiltgen/cuda_leaks Fix cuda leaks	2024-02-19 12:48:00 -08:00
Daniel Hiltgen	fc39a6cd7a	Fix cuda leaks This should resolve the problem where we don't fully unload from the GPU when we go idle.	2024-02-18 18:37:20 -08:00
Daniel Hiltgen	df6dc4fd96	Fix duplicate menus on update and exit on signals Also fixes a few fit-and-finish items for better developer experience	2024-02-16 15:33:16 -08:00
Daniel Hiltgen	db2a9ad1fe	Explicitly disable AVX2 on GPU builds Even though we weren't setting it to on, somewhere in the cmake config it was getting toggled on. By explicitly setting it to off, we get `/arch:AVX` as intended.	2024-02-15 14:50:11 -08:00
Daniel Hiltgen	29e90cc13b	Implement new Go based Desktop app This focuses on Windows first, but coudl be used for Mac and possibly linux in the future.	2024-02-15 05:56:45 +00:00
Jeffrey Morgan	9241a29336	Revert "Revert "bump submodule to `6c00a06` (#2479 )"" (#2485 ) This reverts commit `6920964b87`.	2024-02-13 18:18:41 -08:00
Jeffrey Morgan	f7231ad9ad	set `shutting_down` to `false` once shutdown is complete (#2484 )	2024-02-13 17:48:41 -08:00
Jeffrey Morgan	6920964b87	Revert "bump submodule to `6c00a06` (#2479 )" This reverts commit `2f9ed52bbd`.	2024-02-13 17:23:05 -08:00
Jeffrey Morgan	2f9ed52bbd	bump submodule to `6c00a06` (#2479 )	2024-02-13 17:12:42 -08:00
Daniel Hiltgen	939c60473f	Merge pull request #2422 from dhiltgen/better_kill More robust shutdown	2024-02-12 14:05:06 -08:00
Jeffrey Morgan	f76ca04f9e	update submodule to `099afc6` (#2468 )	2024-02-12 14:01:16 -08:00
Daniel Hiltgen	76b8728f0c	Merge pull request #2465 from dhiltgen/block_rocm_pre_9 Detect AMD GPU info via sysfs and block old cards	2024-02-12 12:41:43 -08:00
Daniel Hiltgen	6d84f07505	Detect AMD GPU info via sysfs and block old cards This wires up some new logic to start using sysfs to discover AMD GPU information and detects old cards we can't yet support so we can fallback to CPU mode.	2024-02-12 08:19:41 -08:00
Jeffrey Morgan	26b13fc33c	patch: always add token to cache_tokens (#2459 )	2024-02-12 08:10:16 -08:00

1 2 3 4 5 ...

311 commits