ollama

Author	SHA1	Message	Date
Daniel Hiltgen	93ea9240ae	Move ollama executable out of bin dir (#6535 )	2024-08-27 16:19:00 -07:00
Daniel Hiltgen	7a1e1c1caf	gpu: Ensure driver version set before variant (#6480 ) During rebasing, the ordering was inverted causing the cuda version selection logic to break, with driver version being evaluated as zero incorrectly causing a downgrade to v11.	2024-08-23 11:21:12 -07:00
Daniel Hiltgen	f9e31da946	Review comments	2024-08-19 10:36:15 -07:00
Daniel Hiltgen	88bb9e3328	Adjust layout to bin+lib/ollama	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	4fe3a556fa	Add cuda v12 variant and selection logic Based on compute capability and driver version, pick v12 or v11 cuda variants.	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	d470ebe78b	Add Jetson cuda variants for arm This adds new variants for arm64 specific to Jetson platforms	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	74d45f0102	Refactor linux packaging This adjusts linux to follow a similar model to windows with a discrete archive (zip/tgz) to cary the primary executable, and dependent libraries. Runners are still carried as payloads inside the main binary Darwin retain the payload model where the go binary is fully self contained.	2024-08-19 09:38:53 -07:00
Daniel Hiltgen	5bca2e60a7	Harden intel boostrap for nil pointers	2024-08-09 11:31:38 -07:00
Michael Yang	b732beba6a	lint	2024-08-01 17:06:06 -07:00
Michael Yang	e2c3f6b3e2	string	2024-07-22 11:27:52 -07:00
Michael Yang	55cd3ddcca	bool	2024-07-22 11:27:21 -07:00
Michael Yang	35b89b2eab	rfc: dynamic environ lookup	2024-07-22 11:25:30 -07:00
Jeffrey Morgan	c4cf8ad559	llm: avoid loading model if system memory is too small (#5637 ) * llm: avoid loading model if system memory is too small * update log * Instrument swap free space On linux and windows, expose how much swap space is available so we can take that into consideration when scheduling models * use `systemSwapFreeMemory` in check --------- Co-authored-by: Daniel Hiltgen <daniel@ollama.com>	2024-07-11 16:42:57 -07:00
Daniel Hiltgen	f6f759fc5f	Detect CUDA OS Overhead This adds logic to detect skew between the driver and management library which can be attributed to OS overhead and records that so we can adjust subsequent management library free VRAM updates and avoid OOM scenarios.	2024-07-09 12:21:50 -07:00
Daniel Hiltgen	ef757da2c9	Better nvidia GPU discovery logging Refine the way we log GPU discovery to improve the non-debug output, and report more actionable log messages when possible to help users troubleshoot on their own.	2024-07-03 10:50:40 -07:00
Daniel Hiltgen	b2799f111b	Move libraries out of users path We update the PATH on windows to get the CLI mapped, but this has an unintended side effect of causing other apps that may use our bundled DLLs to get terminated when we upgrade.	2024-06-17 13:12:18 -07:00
Jeffrey Morgan	163cd3e77c	gpu: add env var for detecting Intel oneapi gpus (#5076 ) * gpu: add env var for detecting intel oneapi gpus * fix build error	2024-06-16 20:09:05 -04:00
Daniel Hiltgen	6f351bf586	review comments and coverage	2024-06-14 14:55:50 -07:00
Daniel Hiltgen	fc37c192ae	Refine CPU load behavior with system memory visibility	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	434dfe30c5	Reintroduce nvidia nvml library for windows This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	4e2b7e181d	Refactor intel gpu discovery	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	6fd04ca922	Improve multi-gpu handling at the limit Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	43ed358f9a	Refine GPU discovery to bootstrap once Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	efac488675	Revert "Limit GPU lib search for now (#4777 )" This reverts commit `476fb8e892`.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	aac367636d	Actually skip PhysX on windows	2024-06-13 13:17:19 -07:00
Michael Yang	bf7edb0d5d	lint linux	2024-06-04 11:13:30 -07:00
Jeffrey Morgan	476fb8e892	Limit GPU lib search for now (#4777 ) * fix oneapi errors on windows 10	2024-06-01 19:24:33 -07:00
Daniel Hiltgen	646371f56d	Merge pull request #3278 from zhewang1-intc/rebase_ollama_main Enabling ollama to run on Intel GPUs with SYCL backend	2024-05-28 16:30:50 -07:00
Patrick Devine	4cc3be3035	Move envconfig and consolidate env vars (#4608 )	2024-05-24 14:57:15 -07:00
Wang,Zhe	fd5971be0b	support ollama run on Intel GPUs	2024-05-24 11:18:27 +08:00
Daniel Hiltgen	30a7d7096c	Bump VRAM buffer back up Under stress scenarios we're seeing OOMs so this should help stabilize the allocations under heavy concurrency stress.	2024-05-10 09:15:28 -07:00
Daniel Hiltgen	8727a9c140	Record more GPU information This cleans up the logging for GPU discovery a bit, and can serve as a foundation to report GPU information in a future UX.	2024-05-09 14:18:14 -07:00
Michael Yang	4736391bfb	llm: add minimum based on layer size	2024-05-06 17:04:19 -07:00
Daniel Hiltgen	380378cc80	Use our libraries first Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly	2024-05-06 14:23:29 -07:00
Daniel Hiltgen	af9eb36f9f	Merge pull request #4135 from dhiltgen/no_physx Skip PhysX cudart library	2024-05-06 13:34:00 -07:00
Daniel Hiltgen	06093fd396	Merge pull request #4067 from dhiltgen/cudart Add CUDA Driver API for GPU discovery	2024-05-06 13:30:27 -07:00
Daniel Hiltgen	f56aa20014	Centralize server config handling This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs	2024-05-05 16:49:50 -07:00
Daniel Hiltgen	b1ad3a43cb	Skip PhysX cudart library For some reason this library gives incorrect GPU information, so skip it	2024-05-03 11:55:32 -07:00
Daniel Hiltgen	089daaeabc	Add CUDA Driver API for GPU discovery We're seeing some corner cases with cudart which might be resolved by switching to the driver API which comes bundled with the driver package	2024-04-30 18:00:45 -07:00
Daniel Hiltgen	34b9db5afc	Request and model concurrency This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.	2024-04-22 19:29:12 -07:00
Michael Yang	7e33a017c0	partial offloading	2024-04-10 11:37:20 -07:00
Daniel Hiltgen	1f11b52511	Refined min memory from testing	2024-04-01 16:48:33 -07:00
Daniel Hiltgen	526d4eb204	Release gpu discovery library after use Leaving the cudart library loaded kept ~30m of memory pinned in the GPU in the main process. This change ensures we don't hold GPU resources when idle.	2024-04-01 16:48:33 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Jeremy	dfc6721b20	add support for libcudart.so for CUDA devices (adds Jetson support)	2024-03-25 11:07:44 -04:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
Daniel Hiltgen	be330174dd	Allow setting max vram for workarounds Until we get all the memory calculations correct, this can provide and escape valve for users to workaround out of memory crashes.	2024-03-06 17:15:06 -08:00
Daniel Hiltgen	9754c6d9d8	Harden AMD driver lookup logic It looks like the version file doesnt exist on older(?) drivers	2024-02-16 16:20:16 -08:00
Daniel Hiltgen	6d84f07505	Detect AMD GPU info via sysfs and block old cards This wires up some new logic to start using sysfs to discover AMD GPU information and detects old cards we can't yet support so we can fallback to CPU mode.	2024-02-12 08:19:41 -08:00
Daniel Hiltgen	4072b5879b	Merge pull request #2246 from dhiltgen/reject_cuda_without_avx Don't disable GPUs on arm without AVX	2024-01-28 16:26:55 -08:00

1 2

86 commits