ollama

Author	SHA1	Message	Date
Daniel Hiltgen	d34d88e417	Revert "Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )"" This reverts commit `755b4e4fc2`.	2024-06-19 08:57:41 -07:00
Daniel Hiltgen	2abebb2cbe	Merge pull request #5128 from zhewang1-intc/fix_levelzero_empty_symbol_detect Fix levelzero empty symbol detect	2024-06-19 08:33:16 -07:00
Blake Mizerany	380e06e5be	types/model: remove Digest The Digest type in its current form is awkward to work with and presents challenges with regard to how it serializes via String using the '-' prefix. We currently only use this in ollama.com, so we'll move our specific needs around digest parsing and validation there.	2024-06-18 20:28:11 -07:00
Wang,Zhe	badf975e45	get real func ptr.	2024-06-19 09:00:51 +08:00
Wang,Zhe	755b4e4fc2	Revert "gpu: add env var for detecting Intel oneapi gpus (#5076 )" This reverts commit `163cd3e77c`.	2024-06-19 08:59:58 +08:00
Michael Yang	21adf8b6d2	Merge pull request #5121 from ollama/mxyng/deepseekv2 deepseek v2 graph	2024-06-18 16:30:58 -07:00
Michael Yang	e873841cbb	deepseek v2 graph	2024-06-18 15:35:12 -07:00
Daniel Hiltgen	26d0bf9236	Merge pull request #5117 from dhiltgen/fix_prediction Handle models with divergent layer sizes	2024-06-18 11:36:51 -07:00
Daniel Hiltgen	359b15a597	Handle models with divergent layer sizes The recent refactoring of the memory prediction assumed all layers are the same size, but for some models (like deepseek-coder-v2) this is not the case, so our predictions were significantly off.	2024-06-18 11:05:34 -07:00
Daniel Hiltgen	b55958a587	Merge pull request #5106 from dhiltgen/clean_logs Tighten up memory prediction logging	2024-06-18 09:24:38 -07:00
Daniel Hiltgen	7784ca33ce	Tighten up memory prediction logging Prior to this change, we logged the memory prediction multiple times as the scheduler iterates to find a suitable configuration, which can be confusing since only the last log before the server starts is actually valid. This now logs once just before starting the server on the final configuration. It also reports what library instead of always saying "offloading to gpu" when using CPU.	2024-06-18 09:15:35 -07:00
Daniel Hiltgen	c9c8c98bf6	Merge pull request #5105 from dhiltgen/cuda_mmap Adjust mmap logic for cuda windows for faster model load	2024-06-17 17:07:30 -07:00
Daniel Hiltgen	171796791f	Adjust mmap logic for cuda windows for faster model load On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.	2024-06-17 16:54:30 -07:00
Jeffrey Morgan	176d0f7075	Update import.md	2024-06-17 19:44:14 -04:00
Daniel Hiltgen	8ed51cac37	Merge pull request #5103 from dhiltgen/faster_win_build Revert powershell jobs, but keep nvcc and cmake parallelism	2024-06-17 14:23:18 -07:00
Daniel Hiltgen	c9e6f0542d	Merge pull request #5069 from dhiltgen/ci_release Implement custom github release action	2024-06-17 13:59:37 -07:00
Daniel Hiltgen	b0930626c5	Add back lower level parallel flags nvcc supports parallelism (threads) and cmake + make can use -j, while msbuild requires /p:CL_MPcount=8	2024-06-17 13:44:46 -07:00
Daniel Hiltgen	e890be4814	Revert "More parallelism on windows generate" This reverts commit `0577af98f4`.	2024-06-17 13:32:46 -07:00
Jeffrey Morgan	152fc202f5	llm: update llama.cpp commit to `7c26775` (#4896 ) * llm: update llama.cpp submodule to `7c26775` * disable `LLAMA_BLAS` for now * `-DLLAMA_OPENMP=off`	2024-06-17 15:56:16 -04:00
Lei Jitang	4ad0d4d6d3	Fix a build warning (#5096 ) Signed-off-by: Lei Jitang <leijitang@outlook.com>	2024-06-17 14:47:48 -04:00
Jeffrey Morgan	163cd3e77c	gpu: add env var for detecting Intel oneapi gpus (#5076 ) * gpu: add env var for detecting intel oneapi gpus * fix build error	2024-06-16 20:09:05 -04:00
Daniel Hiltgen	4c2c8f93dd	Merge pull request #5080 from dhiltgen/debug_intel_crash Add some more debugging logs for intel discovery	2024-06-16 14:42:41 -07:00
Daniel Hiltgen	fd1e6e0590	Add some more debugging logs for intel discovery Also removes an unused overall count variable	2024-06-16 07:42:52 -07:00
royjhan	89c79bec8c	Add ModifiedAt Field to /api/show (#5033 ) * Add Mod Time to Show * Error Handling	2024-06-15 20:53:56 -07:00
Jeffrey Morgan	c7b77004e3	docs: add missing powershell package to windows development instructions (#5075 ) * docs: add missing instruction for powershell build The powershell script for building Ollama on Windows now requires the `ThreadJob` module. Add this to the instructions and dependency list. * Update development.md	2024-06-15 23:08:09 -04:00
Daniel Hiltgen	07d143f412	Merge pull request #5058 from coolljt0725/fix_build_warning gpu: Fix build warning	2024-06-15 11:52:36 -07:00
Daniel Hiltgen	a12283e2ff	Implement custom github release action This implements the release logic we want via gh cli to support updating releases with rc tags in place and retain release notes and other community reactions.	2024-06-15 11:36:56 -07:00
Daniel Hiltgen	4b0050cf0e	Merge pull request #5037 from dhiltgen/faster_win_build More parallelism on windows generate	2024-06-15 08:03:05 -07:00
Daniel Hiltgen	0577af98f4	More parallelism on windows generate Make the build faster	2024-06-15 07:44:55 -07:00
Daniel Hiltgen	17ce203a26	Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround Rocm gfx900 workaround	2024-06-15 07:38:58 -07:00
Daniel Hiltgen	d76555ffb5	Merge pull request #4874 from dhiltgen/rocm_v6_bump Rocm v6 bump	2024-06-15 07:38:32 -07:00
Daniel Hiltgen	2786dff5d3	Merge pull request #4264 from dhiltgen/show_gpu_visible_settings Centralize GPU configuration vars	2024-06-15 07:33:52 -07:00
Lei Jitang	225f0d1219	gpu: Fix build warning Signed-off-by: Lei Jitang <leijitang@outlook.com>	2024-06-15 14:26:23 +08:00
Daniel Hiltgen	532db58311	Merge pull request #4972 from jayson-cloude/main fix: "Skip searching for network devices"	2024-06-14 17:04:40 -07:00
Daniel Hiltgen	6be309e1bd	Centralize GPU configuration vars This should aid in troubleshooting by capturing and reporting the GPU settings at startup in the logs along with all the other server settings.	2024-06-14 15:59:10 -07:00
Daniel Hiltgen	da3bf23354	Workaround gfx900 SDMA bugs Implement support for GPU env var workarounds, and leverage this for the Vega RX 56 which needs HSA_ENABLE_SDMA=0 set to work properly	2024-06-14 15:38:13 -07:00
Daniel Hiltgen	26ab67732b	Bump ROCm linux to 6.1.1	2024-06-14 15:37:54 -07:00
Daniel Hiltgen	45cacbaf05	Merge pull request #4517 from dhiltgen/gpu_incremental Enhanced GPU discovery and multi-gpu support with concurrency	2024-06-14 15:35:00 -07:00
Daniel Hiltgen	17df6520c8	Remove mmap related output calc logic	2024-06-14 14:55:50 -07:00
Daniel Hiltgen	6f351bf586	review comments and coverage	2024-06-14 14:55:50 -07:00
Daniel Hiltgen	ff4f0cbd1d	Prevent multiple concurrent loads on the same gpus While models are loading, the VRAM metrics are dynamic, so try to load on a GPU that doesn't have a model actively loading, or wait to avoid races that lead to OOMs	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	fc37c192ae	Refine CPU load behavior with system memory visibility	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	434dfe30c5	Reintroduce nvidia nvml library for windows This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	4e2b7e181d	Refactor intel gpu discovery	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	48702dd149	Harden unload for empty runners	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	68dfc6236a	refined test timing adjust timing on some tests so they don't timeout on small/slow GPUs	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	5e8ff556cb	Support forced spreading for multi GPU Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	6fd04ca922	Improve multi-gpu handling at the limit Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	206797bda4	Fix concurrency integration test to work locally This worked remotely but wound up trying to spawn multiple servers locally which doesn't work	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	43ed358f9a	Refine GPU discovery to bootstrap once Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.	2024-06-14 14:51:40 -07:00

1 2 3 4 5 ...

2962 commits