ollama

Author	SHA1	Message	Date
Daniel Hiltgen	69c04eecc4	Add windows radeon concurreny note	2024-07-02 12:46:14 -07:00
royjhan	996bb1b85e	OpenAI: /v1/models and /v1/models/{model} compatibility (#5007 ) * OpenAI v1 models * Refactor Writers * Add Test Co-Authored-By: Attila Kerekes * Credit Co-Author Co-Authored-By: Attila Kerekes <439392+keriati@users.noreply.github.com> * Empty List Testing * Use Namespace for Ownedby * Update Test * Add back envconfig * v1/models docs * Use ModelName Parser * Test Names * Remove Docs * Clean Up * Test name Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * Add Middleware for Chat and List * Testing Cleanup * Test with Fatal * Add functionality to chat test * OpenAI: /v1/models/{model} compatibility (#5028) * Retrieve Model * OpenAI Delete Model * Retrieve Middleware * Remove Delete from Branch * Update Test * Middleware Test File * Function name * Cleanup * Test Update * Test Update --------- Co-authored-by: Attila Kerekes <439392+keriati@users.noreply.github.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-07-02 11:50:56 -07:00
Daniel Hiltgen	422dcc3856	Merge pull request #5439 from dhiltgen/fix_centos_7_build Switch ARM64 container image base to rocky 8	2024-07-02 11:01:15 -07:00
Daniel Hiltgen	020bd60ab2	Switch amd container image base to rocky 8 The centos 7 arm mirrors have disappeared due to the EOL 2 days ago, and the vault sed workaround which works for x86 doesn't work for arm.	2024-07-02 10:34:47 -07:00
Daniel Hiltgen	8e277b72bb	Merge pull request #5438 from dhiltgen/fix_centos_7_build Centos 7 EOL broke mirrors	2024-07-02 09:28:00 -07:00
Daniel Hiltgen	4f67b39d26	Centos 7 EOL broke mirrors As of July 1st 2024: Could not resolve host: mirrorlist.centos.org This is expected due to EOL dates.	2024-07-02 09:22:17 -07:00
Josh	2425281317	Merge pull request #5336 from ollama/jyan/from-errors fix: trim spaces for FROM argument, don't trim inside of quotes	2024-07-01 16:32:46 -07:00
Josh	0403e9860e	Merge pull request #5421 from ollama/jyan/ver fix: add unsupported architecture message for linux/windows	2024-07-01 16:32:14 -07:00
Josh Yan	33a65e3ba3	error	2024-07-01 16:04:13 -07:00
Michael Yang	88bcd79bb9	err on insecure path	2024-07-01 15:55:59 -07:00
Josh Yan	7e571f95f0	trimspace test case	2024-07-01 11:07:48 -07:00
Michael Yang	da8e2a0447	use kvs to detect embedding models	2024-07-01 10:47:43 -07:00
Michael Yang	a30915bde1	add capabilities	2024-07-01 10:47:43 -07:00
Michael Yang	58e3fff311	rename templates to template	2024-07-01 10:40:54 -07:00
Michael Yang	3f0b309ad4	remove ManifestV2	2024-07-01 10:40:54 -07:00
Daniel Hiltgen	e70610ef06	Merge pull request #5410 from dhiltgen/ctx_cleanup Fix case for NumCtx	2024-07-01 09:54:20 -07:00
Daniel Hiltgen	dfded7e075	Merge pull request #5364 from dhiltgen/concurrency_docs Document concurrent behavior and settings	2024-07-01 09:49:48 -07:00
Daniel Hiltgen	173b550438	Remove default auto from help message This may confuse users thinking "auto" is an acceptable string - it must be numeric	2024-07-01 09:48:05 -07:00
Daniel Hiltgen	cff3f44f4a	Fix case for NumCtx	2024-07-01 09:43:59 -07:00
Josh Yan	26e4e66faf	updated parsefile test	2024-07-01 09:43:49 -07:00
Daniel Hiltgen	97c9e11768	Switch use_mmap to a pointer type This uses nil as undefined for a cleaner implementation.	2024-07-01 08:44:59 -07:00
Daniel Hiltgen	3518aaef33	Merge pull request #4218 from dhiltgen/auto_parallel Enable concurrency by default	2024-07-01 08:32:29 -07:00
RAPID ARCHITECT	1963c00201	Update README.md (#5214 ) * Update README.md Added Mesop example to web & desktop * Update README.md --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-06-30 22:00:57 -04:00
Eduard	27402cb7a2	Update gpu.md (#5382 ) Runs fine on a NVIDIA GeForce GTX 1050 Ti	2024-06-30 21:48:51 -04:00
Jeffrey Morgan	c1218199cf	Update api.md	2024-06-29 16:22:49 -07:00
Jeffrey Morgan	717f7229eb	Do not shift context for sliding window models (#5368 ) * Do not shift context for sliding window models * truncate prompt > 2/3 tokens * only target gemma2	2024-06-28 19:39:31 -07:00
Daniel Hiltgen	aae56abb7c	Document concurrent behavior and settings	2024-06-28 13:15:57 -07:00
royjhan	5f034f5b63	Include Show Info in Interactive (#5342 )	2024-06-28 13:15:52 -07:00
royjhan	b910fa9010	Ollama Show: Check for Projector Type (#5307 ) * Check exists projtype * Maintain Ordering	2024-06-28 11:30:16 -07:00
royjhan	6d4219083c	Update docs (#5312 )	2024-06-28 09:58:14 -07:00
Michael Yang	1ed4f521c4	Merge pull request #5340 from ollama/mxyng/mem gemma2 graph	2024-06-27 14:26:49 -07:00
Michael Yang	de2163dafd	gemma2 graph	2024-06-27 13:34:52 -07:00
Josh Yan	9bd00041fa	trim all params	2024-06-27 11:18:38 -07:00
Josh Yan	4e986a823c	unquote, trimp space	2024-06-27 10:59:15 -07:00
Michael	2cc7d05012	update readme for gemma 2 (#5333 ) * update readme for gemma 2	2024-06-27 12:45:16 -04:00
Michael Yang	123a722a6f	zip: prevent extracting files into parent dirs (#5314 )	2024-06-26 21:38:21 -07:00
Jeffrey Morgan	4d311eb731	llm: architecture patch (#5316 )	2024-06-26 21:38:12 -07:00
Blake Mizerany	cb42e607c5	llm: speed up gguf decoding by a lot (#5246 ) Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.	2024-06-24 21:47:52 -07:00
Blake Mizerany	2aa91a937b	cmd: defer stating model info until necessary (#5248 ) This commit changes the 'ollama run' command to defer fetching model information until it really needs it. That is, when in interactive mode. It also removes one such case where the model information is fetch in duplicate, just before calling generateInteractive and then again, first thing, in generateInteractive. This positively impacts the performance of the command: ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.168 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.220 total ; time ./before run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./before run llama3 'hi' 0.02s user 0.01s system 2% cpu 1.217 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 4% cpu 0.652 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 5% cpu 0.498 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with or would you like to chat? ./after run llama3 'hi' 0.01s user 0.01s system 3% cpu 0.479 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total ; time ./after run llama3 'hi' Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat? ./after run llama3 'hi' 0.02s user 0.01s system 5% cpu 0.507 total	2024-06-24 20:14:03 -07:00
Daniel Hiltgen	ccef9431c8	Merge pull request #5205 from dhiltgen/modelfile_use_mmap Fix use_mmap parsing for modelfiles	2024-06-21 16:30:36 -07:00
Daniel Hiltgen	642cee1342	Sort the ps output Provide consistent ordering for the ps command - longest duration listed first	2024-06-21 15:59:41 -07:00
royjhan	9a9e7d83c4	Docs (#5149 )	2024-06-21 15:52:09 -07:00
Daniel Hiltgen	9929751cc8	Disable concurrency for AMD + Windows Until ROCm v6.2 ships, we wont be able to get accurate free memory reporting on windows, which makes automatic concurrency too risky. Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes. All other platforms and GPUs have accurate VRAM reporting wired up now, so we can turn on concurrency by default.	2024-06-21 15:45:05 -07:00
Daniel Hiltgen	17b7186cd7	Enable concurrency by default This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM.	2024-06-21 15:45:05 -07:00
Michael Yang	189a43caa2	Merge pull request #5206 from ollama/mxyng/quantize fix: quantization with template	2024-06-21 13:44:34 -07:00
Michael Yang	e835ef1836	fix: quantization with template	2024-06-21 13:39:25 -07:00
Daniel Hiltgen	7e7749224c	Fix use_mmap parsing for modelfiles Add the new tristate parsing logic for the code path for modelfiles, as well as a unit test.	2024-06-21 12:27:19 -07:00
Daniel Hiltgen	c7c2f3bc22	Merge pull request #5194 from dhiltgen/linux_mmap_auto Refine mmap default logic on linux	2024-06-20 11:44:08 -07:00
Daniel Hiltgen	54a79d6a8a	Merge pull request #5125 from dhiltgen/fedora39 Bump latest fedora cuda repo to 39	2024-06-20 11:27:24 -07:00
Daniel Hiltgen	5bf5aeec01	Refine mmap default logic on linux If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.	2024-06-20 11:07:04 -07:00

1 2 3 4 5 ...

3130 commits