ollama

Author	SHA1	Message	Date
Blake Mizerany	c8af3c2d96	server: reuse original download URL for images (#5962 ) This changes the registry client to reuse the original download URL it gets on the first redirect response for all subsequent requests, preventing thundering herd issues when hot new LLMs are released.	2024-07-25 15:58:30 -07:00
Jeffrey Morgan	455e61170d	Update openai.md	2024-07-25 18:34:47 -04:00
royjhan	4de1370a9d	openai tools doc (#5617 )	2024-07-25 18:34:06 -04:00
Jeffrey Morgan	bbf8f102ee	Revert "llm(llama): pass rope factors (#5924 )" (#5963 ) This reverts commit `bb46bbcf5e`.	2024-07-25 18:24:55 -04:00
Daniel Hiltgen	ce3c93b08f	Report better error on cuda unsupported os/arch If we detect an NVIDIA GPU, but nvidia doesn't support the os/arch, this will report a better error for the user and point them to docs to self-install the drivers if possible.	2024-07-24 17:09:20 -07:00
Daniel Hiltgen	6c2129d5d0	Explain font problems on windows 10	2024-07-24 15:22:00 -07:00
Daniel Hiltgen	7c2a157ca4	Ensure amd gpu nodes are numerically sorted For systems that enumerate over 10 CPUs the default lexicographical sort order interleaves CPUs and GPUs.	2024-07-24 13:43:26 -07:00
Michael Yang	bb46bbcf5e	llm(llama): pass rope factors (#5924 )	2024-07-24 16:05:59 -04:00
royjhan	ac33aa7d37	Fix Embed Test Flakes (#5893 ) * float cmp * increase tolerance	2024-07-24 11:15:46 -07:00
Daniel Hiltgen	830fdd2715	Better explain multi-gpu behavior	2024-07-23 15:16:38 -07:00
Ajay Chintala	a6cd8f6169	Update README.md to add LLMStack integration (#5799 )	2024-07-23 14:40:23 -04:00
Daniel Hiltgen	c78089263a	Merge pull request #5864 from dhiltgen/bump_go Bump Go patch version	2024-07-22 16:34:18 -07:00
Daniel Hiltgen	3e5ea035d5	Merge pull request #5757 from lreed-mdsol/lreed/bump-go-version-fix-vulnerabilities bump go version to 1.22.5 to fix security vulnerabilities in docker	2024-07-22 16:32:43 -07:00
Daniel Hiltgen	5d604eec5b	Bump Go patch version	2024-07-22 16:16:28 -07:00
Josh	db0968f30c	fix dupe err message (#5857 )	2024-07-22 15:48:15 -07:00
Daniel Hiltgen	e12fff8810	Enable windows error dialog for subprocess startup Make sure if something goes wrong spawning the process, the user gets enough info to be able to try to self correct, or at least file a bug with details so we can fix it. Once the process starts, we immediately change back to the recommended setting to prevent the blocking dialog. This ensures if the model fails to load (OOM, unsupported model type, etc.) the process will exit quickly and we can scan the stdout/stderr of the subprocess for the reason to report via API.	2024-07-22 14:07:27 -07:00
Michael Yang	9b60a038e5	update api.md	2024-07-22 13:49:51 -07:00
Michael Yang	83a0cb8d88	docs	2024-07-22 13:38:09 -07:00
royjhan	c0648233f2	api embed docs (#5282 )	2024-07-22 13:37:08 -07:00
Jeffrey Morgan	d835368eb8	convert: capture `head_dim` for mistral (#5818 )	2024-07-22 16:16:22 -04:00
Michael Yang	85d9d73a72	comments	2024-07-22 11:49:03 -07:00
Michael Yang	78140a712c	cleanup tests	2024-07-22 11:49:03 -07:00
Michael Yang	1954ec5917	uint64	2024-07-22 11:49:02 -07:00
Michael Yang	0f1910129f	int	2024-07-22 11:30:07 -07:00
Michael Yang	e2c3f6b3e2	string	2024-07-22 11:27:52 -07:00
Michael Yang	8570c1c0ef	keepalive	2024-07-22 11:27:22 -07:00
Michael Yang	55cd3ddcca	bool	2024-07-22 11:27:21 -07:00
Michael Yang	66fe77f084	models	2024-07-22 11:26:12 -07:00
Michael Yang	d1a5227cad	origins	2024-07-22 11:25:30 -07:00
Michael Yang	4f1afd575d	host	2024-07-22 11:25:30 -07:00
Michael Yang	35b89b2eab	rfc: dynamic environ lookup	2024-07-22 11:25:30 -07:00
Daniel Hiltgen	5784c05397	Merge pull request #5854 from dhiltgen/win_exit_status Refine error reporting for subprocess crash	2024-07-22 10:40:22 -07:00
Daniel Hiltgen	f14aa5435d	Merge pull request #5855 from dhiltgen/remove_max_vram Remove no longer supported max vram var	2024-07-22 10:35:29 -07:00
Jeffrey Morgan	f8fedbda20	Update llama.cpp submodule commit to `d94c6e0c` (#5805 )	2024-07-22 12:42:00 -04:00
Jeffrey Morgan	b3e5491e41	server: collect nested tool call objects when parsing (#5824 )	2024-07-22 12:38:03 -04:00
Daniel Hiltgen	cc269ba094	Remove no longer supported max vram var The OLLAMA_MAX_VRAM env var was a temporary workaround for OOM scenarios. With Concurrency this was no longer wired up, and the simplistic value doesn't map to multi-GPU setups. Users can still set `num_gpu` to limit memory usage to avoid OOM if we get our predictions wrong.	2024-07-22 09:08:11 -07:00
Daniel Hiltgen	a3c20e3f18	Refine error reporting for subprocess crash On windows, the exit status winds up being the search term many users search for and end up piling in on issues that are unrelated. This refines the reporting so that if we have a more detailed message we'll suppress the exit status portion of the message.	2024-07-22 08:52:16 -07:00
Jeffrey Morgan	80ee9b5e47	Remove out of space test temporarily (#5825 )	2024-07-21 00:22:11 -04:00
Jeffrey Morgan	5534f2cc6a	llm: consider `head_dim` in llama arch (#5817 )	2024-07-20 21:48:12 -04:00
Daniel Hiltgen	d321297d8a	Merge pull request #5815 from dhiltgen/win_rocm_gfx_features Adjust windows ROCm discovery	2024-07-20 16:02:55 -07:00
Daniel Hiltgen	06e5d74e34	Merge pull request #5506 from dhiltgen/sched_tests Refine scheduler unit tests for reliability	2024-07-20 15:48:39 -07:00
Daniel Hiltgen	5d707e6fd5	Merge pull request #5583 from dhiltgen/integration_improvements Fix context exhaustion integration test for small gpus	2024-07-20 15:48:21 -07:00
Daniel Hiltgen	283948c83b	Adjust windows ROCm discovery The v5 hip library returns unsupported GPUs which wont enumerate at inference time in the runner so this makes sure we align discovery. The gfx906 cards are no longer supported so we shouldn't compile with that GPU type as it wont enumerate at runtime.	2024-07-20 15:17:50 -07:00
Jeffrey Morgan	1475eab95f	add patch for tekken (#5807 )	2024-07-20 13:41:21 -04:00
Jeffrey Morgan	20090f3172	preserve last assistant message (#5802 )	2024-07-19 20:19:26 -07:00
Jeffrey Morgan	69a2d4ccff	Fix generate test flakyness (#5804 )	2024-07-19 19:11:25 -07:00
Josh	e8b954c646	server: validate template (#5734 ) add template validation to modelfile	2024-07-19 15:24:29 -07:00
royjhan	c57317cbf0	OpenAI: Function Based Testing (#5752 ) * distinguish error forwarding * more coverage * rm comment	2024-07-19 11:37:12 -07:00
royjhan	51b2fd299c	adjust openai chat msg processing (#5729 )	2024-07-19 11:19:20 -07:00
Michael Yang	d0634b1596	Merge pull request #5780 from ollama/mxyng/tools fix parsing tool calls: break on unexpected eofs	2024-07-18 12:14:10 -07:00

... 3 4 5 6 7 ...

3412 commits