ollama

Author	SHA1	Message	Date
Daniel Hiltgen	17b7186cd7	Enable concurrency by default This adjusts our default settings to enable multiple models and parallel requests to a single model. Users can still override these by the same env var settings as before. Parallel has a direct impact on num_ctx, which in turn can have a significant impact on small VRAM GPUs so this change also refines the algorithm so that when parallel is not explicitly set by the user, we try to find a reasonable default that fits the model on their GPU(s). As before, multiple models will only load concurrently if they fully fit in VRAM.	2024-06-21 15:45:05 -07:00
Michael Yang	189a43caa2	Merge pull request #5206 from ollama/mxyng/quantize fix: quantization with template	2024-06-21 13:44:34 -07:00
Michael Yang	e835ef1836	fix: quantization with template	2024-06-21 13:39:25 -07:00
Daniel Hiltgen	7e7749224c	Fix use_mmap parsing for modelfiles Add the new tristate parsing logic for the code path for modelfiles, as well as a unit test.	2024-06-21 12:27:19 -07:00
Daniel Hiltgen	c7c2f3bc22	Merge pull request #5194 from dhiltgen/linux_mmap_auto Refine mmap default logic on linux	2024-06-20 11:44:08 -07:00
Daniel Hiltgen	54a79d6a8a	Merge pull request #5125 from dhiltgen/fedora39 Bump latest fedora cuda repo to 39	2024-06-20 11:27:24 -07:00
Daniel Hiltgen	5bf5aeec01	Refine mmap default logic on linux If we try to use mmap when the model is larger than the system free space, loading is slower than the no-mmap approach.	2024-06-20 11:07:04 -07:00
Michael Yang	e01e535cbb	Merge pull request #5192 from ollama/mxyng/kv handle asymmetric embedding KVs	2024-06-20 10:46:24 -07:00
Josh	0195d6a2f8	Merge pull request #5188 from ollama/jyan/tmpdir2 fix: skip os.removeAll() if PID does not exist	2024-06-20 10:40:59 -07:00
Michael Yang	8e0641a9bf	handle asymmetric embedding KVs	2024-06-20 09:57:27 -07:00
Josh Yan	662568d453	err!=nil check	2024-06-20 09:30:59 -07:00
Josh Yan	4ebb66c662	reformat error check	2024-06-20 09:23:43 -07:00
Josh Yan	23e899f32d	skip os.removeAll() if PID does not exist	2024-06-20 08:51:35 -07:00
royjhan	fedf71635e	Extend api/show and ollama show to return more model info (#4881 ) * API Show Extended * Initial Draft of Information Co-Authored-By: Patrick Devine <pdevine@sonic.net> * Clean Up * Descriptive arg error messages and other fixes * Second Draft of Show with Projectors Included * Remove Chat Template * Touches * Prevent wrapping from files * Verbose functionality * Docs * Address Feedback * Lint * Resolve Conflicts * Function Name * Tests for api/show model info * Show Test File * Add Projector Test * Clean routes * Projector Check * Move Show Test * Touches * Doc update --------- Co-authored-by: Patrick Devine <pdevine@sonic.net>	2024-06-19 14:19:02 -07:00
Daniel Hiltgen	97c59be653	Merge pull request #5074 from dhiltgen/app_log_rotation Implement log rotation for tray app	2024-06-19 13:02:24 -07:00
Daniel Hiltgen	9d8a4988e8	Implement log rotation for tray app	2024-06-19 12:53:34 -07:00
Michael Yang	1ae0750a21	Merge pull request #5147 from ollama/mxyng/cleanup remove confusing log message	2024-06-19 12:50:31 -07:00
Michael Yang	9d91e5e587	remove confusing log message	2024-06-19 11:14:11 -07:00
Daniel Hiltgen	96624aa412	Merge pull request #5072 from dhiltgen/windows_path Move libraries out of users path	2024-06-19 09:13:39 -07:00
Daniel Hiltgen	10f33b8537	Merge pull request #5146 from dhiltgen/backout Put back temporary intel GPU env var	2024-06-19 09:12:45 -07:00
Daniel Hiltgen	4a633cc295	Merge pull request #5145 from dhiltgen/bad_loads Fix bad symbol load detection	2024-06-19 09:12:33 -07:00
Daniel Hiltgen	d34d88e417	Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )"" This reverts commit `755b4e4fc2`.	2024-06-19 08:57:41 -07:00
Daniel Hiltgen	52ce350b7a	Fix bad symbol load detection pointer deref's weren't correct on a few libraries, which explains some crashes on older systems or miswired symlinks for discovery libraries.	2024-06-19 08:39:07 -07:00
Daniel Hiltgen	2abebb2cbe	Merge pull request #5128 from zhewang1-intc/fix_levelzero_empty_symbol_detect Fix levelzero empty symbol detect	2024-06-19 08:33:16 -07:00
Blake Mizerany	380e06e5be	types/model: remove Digest The Digest type in its current form is awkward to work with and presents challenges with regard to how it serializes via String using the '-' prefix. We currently only use this in ollama.com, so we'll move our specific needs around digest parsing and validation there.	2024-06-18 20:28:11 -07:00
Wang,Zhe	badf975e45	get real func ptr.	2024-06-19 09:00:51 +08:00
Wang,Zhe	755b4e4fc2	Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )" This reverts commit `163cd3e77c`.	2024-06-19 08:59:58 +08:00
Daniel Hiltgen	1a1c99e334	Bump latest fedora cuda repo to 39	2024-06-18 17:13:54 -07:00
Michael Yang	21adf8b6d2	Merge pull request #5121 from ollama/mxyng/deepseekv2 deepseek v2 graph	2024-06-18 16:30:58 -07:00
Daniel Hiltgen	784bf88b0d	Wire up windows AMD driver reporting This seems to be ROCm version, not actually driver version, but it may be useful for toggling logic for VRAM reporting in the future	2024-06-18 16:22:47 -07:00
Michael Yang	e873841cbb	deepseek v2 graph	2024-06-18 15:35:12 -07:00
Daniel Hiltgen	26d0bf9236	Merge pull request #5117 from dhiltgen/fix_prediction Handle models with divergent layer sizes	2024-06-18 11:36:51 -07:00
Daniel Hiltgen	359b15a597	Handle models with divergent layer sizes The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off.	2024-06-18 11:05:34 -07:00
Daniel Hiltgen	b55958a587	Merge pull request #5106 from dhiltgen/clean_logs Tighten up memory prediction logging	2024-06-18 09:24:38 -07:00
Daniel Hiltgen	7784ca33ce	Tighten up memory prediction logging Prior to this change, we logged the memory prediction multiple times as the scheduler iterates to find a suitable configuration, which can be confusing since only the last log before the server starts is actually valid. This now logs once just before starting the server on the final configuration. It also reports what library instead of always saying "offloading to gpu" when using CPU.	2024-06-18 09:15:35 -07:00
Daniel Hiltgen	c9c8c98bf6	Merge pull request #5105 from dhiltgen/cuda_mmap Adjust mmap logic for cuda windows for faster model load	2024-06-17 17:07:30 -07:00
Daniel Hiltgen	171796791f	Adjust mmap logic for cuda windows for faster model load On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.	2024-06-17 16:54:30 -07:00
Jeffrey Morgan	176d0f7075	Update import.md	2024-06-17 19:44:14 -04:00
Daniel Hiltgen	8ed51cac37	Merge pull request #5103 from dhiltgen/faster_win_build Revert powershell jobs, but keep nvcc and cmake parallelism	2024-06-17 14:23:18 -07:00
Daniel Hiltgen	c9e6f0542d	Merge pull request #5069 from dhiltgen/ci_release Implement custom github release action	2024-06-17 13:59:37 -07:00
Daniel Hiltgen	b0930626c5	Add back lower level parallel flags nvcc supports parallelism (threads) and cmake + make can use -j, while msbuild requires /p:CL_MPcount=8	2024-06-17 13:44:46 -07:00
Daniel Hiltgen	e890be4814	Revert "More parallelism on windows generate" This reverts commit `0577af98f4`.	2024-06-17 13:32:46 -07:00
Daniel Hiltgen	b2799f111b	Move libraries out of users path We update the PATH on windows to get the CLI mapped, but this has an unintended side effect of causing other apps that may use our bundled DLLs to get terminated when we upgrade.	2024-06-17 13:12:18 -07:00
Jeffrey Morgan	152fc202f5	llm: update llama.cpp commit to `7c26775` (#4896 ) * llm: update llama.cpp submodule to `7c26775` * disable `LLAMA_BLAS` for now * `-DLLAMA_OPENMP=off`	2024-06-17 15:56:16 -04:00
Lei Jitang	4ad0d4d6d3	Fix a build warning (#5096 ) Signed-off-by: Lei Jitang <leijitang@outlook.com>	2024-06-17 14:47:48 -04:00
Jeffrey Morgan	163cd3e77c	gpu: add env var for detecting Intel oneapi gpus (#5076 ) * gpu: add env var for detecting intel oneapi gpus * fix build error	2024-06-16 20:09:05 -04:00
Daniel Hiltgen	4c2c8f93dd	Merge pull request #5080 from dhiltgen/debug_intel_crash Add some more debugging logs for intel discovery	2024-06-16 14:42:41 -07:00
Daniel Hiltgen	fd1e6e0590	Add some more debugging logs for intel discovery Also removes an unused overall count variable	2024-06-16 07:42:52 -07:00
royjhan	89c79bec8c	Add ModifiedAt Field to /api/show (#5033 ) * Add Mod Time to Show * Error Handling	2024-06-15 20:53:56 -07:00
Jeffrey Morgan	c7b77004e3	docs: add missing powershell package to windows development instructions (#5075 ) * docs: add missing instruction for powershell build The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list. * Update development.md	2024-06-15 23:08:09 -04:00

... 3 4 5 6 7 ...

3187 commits