ollama

Author	SHA1	Message	Date
Bruce MacDonald	2f804068bd	warn when json format is expected but not mentioned in prompt (#3081 )	2024-03-12 19:07:11 -04:00
Bruce MacDonald	b80661e8c7	relay load model errors to the client (#3065 )	2024-03-11 16:48:27 -04:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
Jeffrey Morgan	4613a080e7	update llama.cpp submodule to `66c1968f7` (#2618 )	2024-02-20 17:42:31 -05:00
Daniel Hiltgen	6680761596	Shutdown faster Make sure that when a shutdown signal comes, we shutdown quickly instead of waiting for a potentially long exchange to wrap up.	2024-02-08 22:22:50 -08:00
Jeffrey Morgan	f11bf0740b	use `llm.ImageData`	2024-01-31 19:13:48 -08:00
Jeffrey Morgan	2e06ed01d5	remove unknown `CPPFLAGS` option	2024-01-28 17:51:23 -08:00
Jeffrey Morgan	a64570dcae	Fix clearing kv cache between requests with the same prompt (#2186 ) * Fix clearing kv cache between requests with the same prompt * fix powershell script	2024-01-25 13:46:20 -08:00
Daniel Hiltgen	3bc28736cd	Merge pull request #2143 from dhiltgen/llm_verbosity Refine debug logging for llm	2024-01-22 13:19:16 -08:00
Daniel Hiltgen	730dcfcc7a	Refine debug logging for llm This wires up logging in llama.cpp to always go to stderr, and also turns up logging if OLLAMA_DEBUG is set.	2024-01-22 12:26:49 -08:00
Daniel Hiltgen	27a2d5af54	Debug logging on init failure	2024-01-22 12:08:22 -08:00
Jeffrey Morgan	89c4aee29e	Unlock mutex when failing to load model (#2117 )	2024-01-20 20:54:46 -05:00
Daniel Hiltgen	fedd705aea	Mechanical switch from log to slog A few obvious levels were adjusted, but generally everything mapped to "info" level.	2024-01-18 14:12:57 -08:00
Daniel Hiltgen	1b249748ab	Add multiple CPU variants for Intel Mac This also refines the build process for the ext_server build.	2024-01-17 15:08:54 -08:00
Bruce MacDonald	a897e833b8	do not cache prompt (#2018 ) - prompt cache causes inferance to hang after some time	2024-01-16 13:48:05 -05:00
Daniel Hiltgen	2ecb247276	Fix intel mac build Make sure we're building an x86 ext_server lib when cross-compiling	2024-01-13 14:46:34 -08:00
Daniel Hiltgen	39928a42e8	Always dynamically load the llm server library This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform	2024-01-11 08:42:47 -08:00

17 commits