ollama

Author	SHA1	Message	Date
Michael Yang	76bdebbadf	decode ggla	2024-03-08 15:46:25 -08:00
Jeffrey Morgan	0e4669b04f	update llama.cpp submodule to `6cdabe6` (#2999 )	2024-03-08 00:26:20 -08:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
John	23ebe8fe11	fix some typos (#2973 ) Signed-off-by: hishope <csqiye@126.com>	2024-03-06 22:50:11 -08:00
Patrick Devine	2c017ca441	Convert Safetensors to an Ollama model (#2824 )	2024-03-06 21:01:51 -08:00
Jeffrey Morgan	21347e1ed6	update llama.cpp submodule to `c29af7e` (#2868 )	2024-03-01 15:26:04 -08:00
Daniel Hiltgen	bd1d8b0d14	Merge pull request #2836 from bmwiedemann/gzip Omit build date from gzip headers	2024-02-29 15:46:46 -08:00
Jeffrey Morgan	cbf4970e0f	bump submodule to `87c91c07663b707e831c59ec373b5e665ff9d64a` (#2828 )	2024-02-29 09:42:08 -08:00
Bernhard M. Wiedemann	76e5d9ec88	Omit build date from gzip headers See https://reproducible-builds.org/ for why this is good. This patch was done while working on reproducible builds for openSUSE.	2024-02-29 16:48:19 +01:00
Daniel Hiltgen	061e8f6abc	Bump llama.cpp to b2276	2024-02-26 16:49:24 -08:00
Jeffrey Morgan	11bfff8ee1	update llama.cpp submodule to `96633eeca1265ed03e57230de54032041c58f9cd`	2024-02-22 16:44:26 -05:00
Jeffrey Morgan	efe040f8c0	reset with `init_vars` ahead of each cpu build in `gen_windows.ps1` (#2654 )	2024-02-21 16:35:34 -05:00
Jeffrey Morgan	2a7553ce09	update llama.cpp submodule to `c14f72d`	2024-02-21 09:03:14 -05:00
Jeffrey Morgan	b3eac61cac	update llama.cpp submodule to `f0d1fafc029a056cd765bdae58dcaa12312e9879`	2024-02-20 22:56:51 -05:00
Michael Yang	949d7b1c48	add gguf file types (#2532 )	2024-02-20 19:06:29 -05:00
Jeffrey Morgan	4613a080e7	update llama.cpp submodule to `66c1968f7` (#2618 )	2024-02-20 17:42:31 -05:00
Taras Tsugrii	01ff2e14db	[nit] Remove unused msg local var. (#2511 )	2024-02-20 14:02:34 -05:00
Daniel Hiltgen	4fcbf1cde6	Merge pull request #2599 from dhiltgen/fix_avx Explicitly disable AVX2 on GPU builds	2024-02-19 13:13:05 -08:00
Daniel Hiltgen	9220b4fa91	Merge pull request #2585 from dhiltgen/cuda_leaks Fix cuda leaks	2024-02-19 12:48:00 -08:00
Daniel Hiltgen	fc39a6cd7a	Fix cuda leaks This should resolve the problem where we don't fully unload from the GPU when we go idle.	2024-02-18 18:37:20 -08:00
Daniel Hiltgen	df6dc4fd96	Fix duplicate menus on update and exit on signals Also fixes a few fit-and-finish items for better developer experience	2024-02-16 15:33:16 -08:00
Daniel Hiltgen	db2a9ad1fe	Explicitly disable AVX2 on GPU builds Even though we weren't setting it to on, somewhere in the cmake config it was getting toggled on. By explicitly setting it to off, we get `/arch:AVX` as intended.	2024-02-15 14:50:11 -08:00
Daniel Hiltgen	29e90cc13b	Implement new Go based Desktop app This focuses on Windows first, but coudl be used for Mac and possibly linux in the future.	2024-02-15 05:56:45 +00:00
Jeffrey Morgan	9241a29336	Revert "Revert "bump submodule to `6c00a06` (#2479 )"" (#2485 ) This reverts commit `6920964b87`.	2024-02-13 18:18:41 -08:00
Jeffrey Morgan	f7231ad9ad	set `shutting_down` to `false` once shutdown is complete (#2484 )	2024-02-13 17:48:41 -08:00
Jeffrey Morgan	6920964b87	Revert "bump submodule to `6c00a06` (#2479 )" This reverts commit `2f9ed52bbd`.	2024-02-13 17:23:05 -08:00
Jeffrey Morgan	2f9ed52bbd	bump submodule to `6c00a06` (#2479 )	2024-02-13 17:12:42 -08:00
Daniel Hiltgen	939c60473f	Merge pull request #2422 from dhiltgen/better_kill More robust shutdown	2024-02-12 14:05:06 -08:00
Jeffrey Morgan	f76ca04f9e	update submodule to `099afc6` (#2468 )	2024-02-12 14:01:16 -08:00
Daniel Hiltgen	76b8728f0c	Merge pull request #2465 from dhiltgen/block_rocm_pre_9 Detect AMD GPU info via sysfs and block old cards	2024-02-12 12:41:43 -08:00
Daniel Hiltgen	6d84f07505	Detect AMD GPU info via sysfs and block old cards This wires up some new logic to start using sysfs to discover AMD GPU information and detects old cards we can't yet support so we can fallback to CPU mode.	2024-02-12 08:19:41 -08:00
Jeffrey Morgan	26b13fc33c	patch: always add token to cache_tokens (#2459 )	2024-02-12 08:10:16 -08:00
Daniel Hiltgen	6680761596	Shutdown faster Make sure that when a shutdown signal comes, we shutdown quickly instead of waiting for a potentially long exchange to wrap up.	2024-02-08 22:22:50 -08:00
Daniel Hiltgen	a1dfab43b9	Ensure the libraries are present When we store our libraries in a temp dir, a reaper might clean them when we are idle, so make sure to check for them before we reload.	2024-02-07 17:27:49 -08:00
Daniel Hiltgen	de76b95dd4	Bump llama.cpp to b2081	2024-02-06 12:06:43 -08:00
Daniel Hiltgen	27aa2d4a19	Merge pull request #1849 from mraiser/main Accomodate split cuda lib dir	2024-02-05 16:01:16 -08:00
Daniel Hiltgen	e1f50377f4	Harden generate patching model Only apply patches if we have any, and make sure to cleanup every file we patched at the end to leave the tree clean	2024-02-01 19:34:36 -08:00
Jeffrey Morgan	f11bf0740b	use `llm.ImageData`	2024-01-31 19:13:48 -08:00
Michael Yang	8450bf66e6	trim images	2024-01-31 19:13:47 -08:00
Daniel Hiltgen	72b12c3be7	Bump llama.cpp to b1999 This requires an upstream change to support graceful termination, carried as a patch.	2024-01-30 16:52:12 -08:00
Jeffrey Morgan	2e06ed01d5	remove unknown `CPPFLAGS` option	2024-01-28 17:51:23 -08:00
mraiser	4c4c730a0a	Merge branch 'ollama:main' into main	2024-01-27 21:56:11 -05:00
Daniel Hiltgen	e02ecfb6c8	Merge pull request #2116 from dhiltgen/cc_50_80 Add support for CUDA 5.0 cards	2024-01-27 10:28:38 -08:00
Jeffrey Morgan	3ebd6a83fc	update submodule to `cd4fddb29f81d6a1f6d51a0c016bc6b486d68def`	2024-01-25 13:54:11 -08:00
Jeffrey Morgan	a64570dcae	Fix clearing kv cache between requests with the same prompt (#2186 ) * Fix clearing kv cache between requests with the same prompt * fix powershell script	2024-01-25 13:46:20 -08:00
mraiser	a4564232a4	Update gen_linux.sh to find libcudart in separate directory	2024-01-25 09:49:35 -05:00
Michael Yang	cd22855ef8	refactor tensor read	2024-01-24 10:48:31 -08:00
Jeffrey Morgan	4458efb73a	Load all layers on `arm64` macOS if model is small enough (#2149 )	2024-01-22 17:40:06 -08:00
Daniel Hiltgen	0f5b843319	Refine Accelerate usage on mac For old macs, accelerate seems to cause crashes, but for AVX2 capable macs, it does not.	2024-01-22 16:25:56 -08:00
Jeffrey Morgan	ffaf52e1e9	update submodule to `011e8ec577fd135cbc02993d3ea9840c516d6a1c`	2024-01-22 15:16:54 -08:00

1 2 3 4 5 ...

293 commits