ollama

Author	SHA1	Message	Date
Daniel Hiltgen	9929751cc8	Disable concurrency for AMD + Windows Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.	2024-06-21 15:45:05 -07:00
Daniel Hiltgen	17b7186cd7	Enable concurrency by default This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM.	2024-06-21 15:45:05 -07:00
Daniel Hiltgen	d34d88e417	Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )"" This reverts commit `755b4e4fc2`.	2024-06-19 08:57:41 -07:00
Wang,Zhe	755b4e4fc2	Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )" This reverts commit `163cd3e77c`.	2024-06-19 08:59:58 +08:00
Jeffrey Morgan	163cd3e77c	gpu: add env var for detecting Intel oneapi gpus (#5076 ) * gpu: add env var for detecting intel oneapi gpus * fix build error	2024-06-16 20:09:05 -04:00
Daniel Hiltgen	6be309e1bd	Centralize GPU configuration vars This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.	2024-06-14 15:59:10 -07:00
Daniel Hiltgen	5e8ff556cb	Support forced spreading for multi GPU Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.	2024-06-14 14:51:40 -07:00
Patrick Devine	94618b2365	add OLLAMA_MODELS to envconfig (#5029 )	2024-06-13 12:52:03 -07:00
Patrick Devine	c69bc19e46	move OLLAMA_HOST to envconfig (#5009 )	2024-06-12 18:48:16 -04:00
royjhan	1a29e9a879	API app/browser access (#4879 ) * API app/browser access * Add tauri (resolves #2291, #4791, #3799, #4388)	2024-06-06 15:19:03 -07:00
Michael Yang	c895a7d13f	some gocritic	2024-06-04 11:13:30 -07:00
Michael Yang	dad7a987ae	nosprintfhostport	2024-06-04 11:13:30 -07:00
Lei Jitang	a03be18189	Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message (#4663 ) * envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY Signed-off-by: Lei Jitang <leijitang@outlook.com> * serve: Add more env to help message of ollama serve Add more enviroment variables to `ollama serve --help` to let users know what can be configurated. Signed-off-by: Lei Jitang <leijitang@outlook.com> --------- Signed-off-by: Lei Jitang <leijitang@outlook.com>	2024-05-30 09:36:51 -07:00
Patrick Devine	4cc3be3035	Move envconfig and consolidate env vars (#4608 )	2024-05-24 14:57:15 -07:00

14 commits