ollama

Author	SHA1	Message	Date
Michael Yang	142cbb722d	Merge pull request #6482 from ollama/mxyng/client-path passthrough OLLAMA_HOST path to client	2024-08-30 09:40:34 -07:00
Daniel Hiltgen	93ea9240ae	Move ollama executable out of bin dir (#6535 )	2024-08-27 16:19:00 -07:00
Michael Yang	386af6c1a0	passthrough OLLAMA_HOST path to client	2024-08-23 13:23:28 -07:00
Daniel Hiltgen	88bb9e3328	Adjust layout to bin+lib/ollama	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	74d45f0102	Refactor linux packaging This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.	2024-08-19 09:38:53 -07:00
Michael Yang	85d9d73a72	comments	2024-07-22 11:49:03 -07:00
Michael Yang	78140a712c	cleanup tests	2024-07-22 11:49:03 -07:00
Michael Yang	0f1910129f	int	2024-07-22 11:30:07 -07:00
Michael Yang	e2c3f6b3e2	string	2024-07-22 11:27:52 -07:00
Michael Yang	8570c1c0ef	keepalive	2024-07-22 11:27:22 -07:00
Michael Yang	55cd3ddcca	bool	2024-07-22 11:27:21 -07:00
Michael Yang	66fe77f084	models	2024-07-22 11:26:12 -07:00
Michael Yang	d1a5227cad	origins	2024-07-22 11:25:30 -07:00
Michael Yang	4f1afd575d	host	2024-07-22 11:25:30 -07:00
Michael Yang	35b89b2eab	rfc: dynamic environ lookup	2024-07-22 11:25:30 -07:00
Daniel Hiltgen	cc269ba094	Remove no longer supported max vram var The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM scenarios. With Concurrency this was no longer wired up, and the simplistic value doesn't map to multi-GPU setups. Users can still set `num_gpu` to limit memory usage to avoid OOM if we get our predictions wrong.	2024-07-22 09:08:11 -07:00
Anatoli Babenia	0d16eb310e	fix: use `envconfig.ModelsDir` directly (#4821 ) * Co-authored-by: Anatoli Babenia <anatoli@rainforce.org> Co-authored-by: Maas Lalani <maas@lalani.dev>	2024-07-03 15:36:11 -07:00
Daniel Hiltgen	955f2a4e03	Only set default keep_alive on initial model load This change fixes the handling of keep_alive so that if client request omits the setting, we only set this on initial load. Once the model is loaded, if new requests leave this unset, we'll keep whatever keep_alive was there.	2024-07-03 15:29:56 -07:00
Daniel Hiltgen	173b550438	Remove default auto from help message This may confuse users thinking "auto" is an acceptable string - it must be numeric	2024-07-01 09:48:05 -07:00
Daniel Hiltgen	9929751cc8	Disable concurrency for AMD + Windows Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.	2024-06-21 15:45:05 -07:00
Daniel Hiltgen	17b7186cd7	Enable concurrency by default This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM.	2024-06-21 15:45:05 -07:00
Daniel Hiltgen	d34d88e417	Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )"" This reverts commit `755b4e4fc2`.	2024-06-19 08:57:41 -07:00
Wang,Zhe	755b4e4fc2	Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )" This reverts commit `163cd3e77c`.	2024-06-19 08:59:58 +08:00
Jeffrey Morgan	163cd3e77c	gpu: add env var for detecting Intel oneapi gpus (#5076 ) * gpu: add env var for detecting intel oneapi gpus * fix build error	2024-06-16 20:09:05 -04:00
Daniel Hiltgen	6be309e1bd	Centralize GPU configuration vars This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.	2024-06-14 15:59:10 -07:00
Daniel Hiltgen	5e8ff556cb	Support forced spreading for multi GPU Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.	2024-06-14 14:51:40 -07:00
Patrick Devine	94618b2365	add OLLAMA_MODELS to envconfig (#5029 )	2024-06-13 12:52:03 -07:00
Patrick Devine	c69bc19e46	move OLLAMA_HOST to envconfig (#5009 )	2024-06-12 18:48:16 -04:00
royjhan	1a29e9a879	API app/browser access (#4879 ) * API app/browser access * Add tauri (resolves #2291, #4791, #3799, #4388)	2024-06-06 15:19:03 -07:00
Michael Yang	c895a7d13f	some gocritic	2024-06-04 11:13:30 -07:00
Michael Yang	dad7a987ae	nosprintfhostport	2024-06-04 11:13:30 -07:00
Lei Jitang	a03be18189	Fix OLLAMA_LLM_LIBRARY with wrong map name and add more env vars to help message (#4663 ) * envconfig/config.go: Fix wrong description of OLLAMA_LLM_LIBRARY Signed-off-by: Lei Jitang <leijitang@outlook.com> * serve: Add more env to help message of ollama serve Add more enviroment variables to `ollama serve --help` to let users know what can be configurated. Signed-off-by: Lei Jitang <leijitang@outlook.com> --------- Signed-off-by: Lei Jitang <leijitang@outlook.com>	2024-05-30 09:36:51 -07:00
Patrick Devine	4cc3be3035	Move envconfig and consolidate env vars (#4608 )	2024-05-24 14:57:15 -07:00

33 commits