ollama

Author	SHA1	Message	Date
Blake Mizerany	acfa2b9422	llm: prevent race appending to slice (#3320 )	2024-03-24 11:35:54 -07:00
Daniel Hiltgen	74788b487c	Better tmpdir cleanup If expanding the runners fails, don't leave a corrupt/incomplete payloads dir We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs and remove this as long as there isn't still a process running.	2024-03-20 16:03:19 +01:00
Blake Mizerany	6ce37e4d96	llm,readline: use errors.Is instead of simple == check (#3161 ) This fixes some brittle, simple equality checks to use errors.Is. Since go1.13, errors.Is is the idiomatic way to check for errors. Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-03-15 07:14:12 -07:00
Jeffrey Morgan	1ffb1e2874	update llama.cpp submodule to `77d1ac7` (#3030 )	2024-03-09 15:55:34 -08:00
Daniel Hiltgen	4a5c9b8035	Finish unwinding idempotent payload logic The recent ROCm change partially removed idempotent payloads, but the ggml-metal.metal file for mac was still idempotent. This finishes switching to always extract the payloads, and now that idempotentcy is gone, the version directory is no longer useful.	2024-03-09 08:34:39 -08:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
Daniel Hiltgen	6d84f07505	Detect AMD GPU info via sysfs and block old cards This wires up some new logic to start using sysfs to discover AMD GPU information and detects old cards we can't yet support so we can fallback to CPU mode.	2024-02-12 08:19:41 -08:00
Jeffrey Morgan	dc88cc3981	use `gzip` for runner embedding (#2067 )	2024-01-19 13:23:03 -05:00
Daniel Hiltgen	fedd705aea	Mechanical switch from log to slog A few obvious levels were adjusted, but generally everything mapped to "info" level.	2024-01-18 14:12:57 -08:00
Daniel Hiltgen	1b249748ab	Add multiple CPU variants for Intel Mac This also refines the build process for the ext_server build.	2024-01-17 15:08:54 -08:00
Daniel Hiltgen	3773fb6465	Merge pull request #1935 from dhiltgen/cpu_fallback Fix up the CPU fallback selection	2024-01-11 15:52:32 -08:00
Daniel Hiltgen	7427fa1387	Fix up the CPU fallback selection The memory changes and multi-variant change had some merge glitches I missed. This fixes them so we actually get the cpu llm lib and best variant for the given system.	2024-01-11 15:27:06 -08:00
Michael Yang	d2be6387c9	fix typo	2024-01-11 14:25:21 -08:00
Michael Yang	defc1dbd6e	use x/exp/slices	2024-01-11 14:20:13 -08:00
Daniel Hiltgen	39928a42e8	Always dynamically load the llm server library This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform	2024-01-11 08:42:47 -08:00

15 commits