ollama

Author	SHA1	Message	Date
Daniel Hiltgen	17ce203a26	Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround Rocm gfx900 workaround	2024-06-15 07:38:58 -07:00
Daniel Hiltgen	d76555ffb5	Merge pull request #4874 from dhiltgen/rocm_v6_bump Rocm v6 bump	2024-06-15 07:38:32 -07:00
Daniel Hiltgen	2786dff5d3	Merge pull request #4264 from dhiltgen/show_gpu_visible_settings Centralize GPU configuration vars	2024-06-15 07:33:52 -07:00
Lei Jitang	225f0d1219	gpu: Fix build warning Signed-off-by: Lei Jitang <leijitang@outlook.com>	2024-06-15 14:26:23 +08:00
Daniel Hiltgen	532db58311	Merge pull request #4972 from jayson-cloude/main fix: "Skip searching for network devices"	2024-06-14 17:04:40 -07:00
Daniel Hiltgen	6be309e1bd	Centralize GPU configuration vars This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.	2024-06-14 15:59:10 -07:00
Daniel Hiltgen	da3bf23354	Workaround gfx900 SDMA bugs Implement support for GPU env var workarounds, and leverage this for the Vega RX 56 which needs HSA_ENABLE_SDMA=0 set to work properly	2024-06-14 15:38:13 -07:00
Daniel Hiltgen	26ab67732b	Bump ROCm linux to 6.1.1	2024-06-14 15:37:54 -07:00
Daniel Hiltgen	45cacbaf05	Merge pull request #4517 from dhiltgen/gpu_incremental Enhanced GPU discovery and multi-gpu support with concurrency	2024-06-14 15:35:00 -07:00
Daniel Hiltgen	17df6520c8	Remove mmap related output calc logic	2024-06-14 14:55:50 -07:00
Daniel Hiltgen	6f351bf586	review comments and coverage	2024-06-14 14:55:50 -07:00
Daniel Hiltgen	ff4f0cbd1d	Prevent multiple concurrent loads on the same gpus While models are loading, the VRAM metrics are dynamic, so try to load on a GPU that doesn't have a model actively loading, or wait to avoid races that lead to OOMs	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	fc37c192ae	Refine CPU load behavior with system memory visibility	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	434dfe30c5	Reintroduce nvidia nvml library for windows This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	4e2b7e181d	Refactor intel gpu discovery	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	48702dd149	Harden unload for empty runners	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	68dfc6236a	refined test timing adjust timing on some tests so they don't timeout on small/slow GPUs	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	5e8ff556cb	Support forced spreading for multi GPU Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	6fd04ca922	Improve multi-gpu handling at the limit Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	206797bda4	Fix concurrency integration test to work locally This worked remotely but wound up trying to spawn multiple servers locally which doesn't work	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	43ed358f9a	Refine GPU discovery to bootstrap once Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	b32ebb4f29	Use DRM driver for VRAM info for amd The amdgpu drivers free VRAM reporting omits some other apps, so leverage the upstream DRM driver which keeps better tabs on things	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	fb9cdfa723	Fix server.cpp for the new cuda build macros	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	efac488675	Revert "Limit GPU lib search for now (#4777 )" This reverts commit `476fb8e892`.	2024-06-14 14:51:40 -07:00
Jeffrey Morgan	6b800aa7b7	openai: do not set temperature to 0 when setting seed (#5045 )	2024-06-14 13:43:56 -07:00
Jeffrey Morgan	dd7c9ebeaf	server: longer timeout in `TestRequests` (#5046 )	2024-06-14 09:48:25 -07:00
Patrick Devine	4dc7fb9525	update 40xx gpu compat matrix (#5036 )	2024-06-13 17:10:33 -07:00
Daniel Hiltgen	c39761c552	Merge pull request #5032 from dhiltgen/actually_skip Actually skip PhysX on windows	2024-06-13 13:26:09 -07:00
Daniel Hiltgen	aac367636d	Actually skip PhysX on windows	2024-06-13 13:17:19 -07:00
Michael Yang	15a687ae4b	Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16 fix: multibyte utf16	2024-06-13 13:14:55 -07:00
Michael Yang	d528e1af75	fix utf16 for multibyte runes	2024-06-13 13:07:42 -07:00
Michael Yang	cd234ce22c	parser: add test for multibyte runes	2024-06-13 13:07:42 -07:00
Patrick Devine	94618b2365	add OLLAMA_MODELS to envconfig (#5029 )	2024-06-13 12:52:03 -07:00
Jeffrey Morgan	1fd236d177	server: remove jwt decoding error (#5027 )	2024-06-13 11:21:15 -07:00
Michael Yang	e87fc7200d	Merge pull request #5025 from ollama/mxyng/revert-parser-scan Revert "proper utf16 support"	2024-06-13 10:31:25 -07:00
Michael Yang	20b9f8e6f4	Revert "proper utf16 support" This reverts commit `66ab48772f`. this change broke utf-8 scanning of multi-byte runes	2024-06-13 10:22:16 -07:00
Patrick Devine	c69bc19e46	move OLLAMA_HOST to envconfig (#5009 )	2024-06-12 18:48:16 -04:00
Michael Yang	bba5d177aa	Merge pull request #5004 from ollama/mxyng/fix-templates fix: multiple templates when creating from model	2024-06-12 14:39:29 -07:00
Michael Yang	c16f8af911	fix: multiple templates when creating from model multiple templates may appear in a model if a model is created from another model that 1) has an autodetected template and 2) defines a custom template	2024-06-12 13:35:49 -07:00
Michael Yang	217f60c3d9	Merge pull request #4987 from ollama/mxyng/revert-byte-order Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"	2024-06-11 16:04:20 -07:00
Michael Yang	7bdcd1da94	Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" This reverts commit `f5f245cc15`, reversing changes made to `94d37fdcae`. this change broke gguf v2 which is incorrectly detected as big endian	2024-06-11 15:56:17 -07:00
Jeffrey Morgan	ead259d877	llm: fix seed value not being applied to requests (#4986 )	2024-06-11 14:24:41 -07:00
James Montgomery	2ff45d571d	Add Ollama-hpp to Community Libraries in README. (#4983 )	2024-06-11 11:15:05 -07:00
jayson-cloude	157f09acdf	fix: "Skip searching for network devices" On an Ubuntu 24.04 computer with vmware installed, the sudo lshw command will get stuck. "Network interfaces" is always displayed	2024-06-11 16:11:35 +08:00
Michael Yang	0f3cf1d42e	Merge pull request #4715 from ollama/mxyng/utf16-parser proper utf16 support	2024-06-10 11:41:29 -07:00
Michael Yang	5bc029c529	Merge pull request #4921 from ollama/mxyng/import-md update import.md	2024-06-10 11:41:09 -07:00
Michael Yang	e9a9c6a8e8	Merge pull request #4965 from ollama/mxyng/skip-layer-remove fix: skip removing layers that no longer exist	2024-06-10 11:40:03 -07:00
Michael Yang	515f497e6d	fix: skip removing layers that no longer exist	2024-06-10 11:32:19 -07:00
Michael Yang	b27268aaef	add test	2024-06-10 11:32:15 -07:00
Michael Yang	f5f245cc15	Merge pull request #4938 from ollama/mxyng/fix-byte-order fix parsing big endian gguf	2024-06-10 09:38:12 -07:00

... 4 5 6 7 8 ...

3183 commits