ollama

Author	SHA1	Message	Date
Daniel Hiltgen	ab8c929e20	Add ability to skip oneapi generate This follows the same pattern for cuda and rocm to allow disabling the build even when we detect the dependent libraries	2024-06-07 08:32:49 -07:00
Daniel Hiltgen	646371f56d	Merge pull request #3278 from zhewang1-intc/rebase_ollama_main Enabling ollama to run on Intel GPUs with SYCL backend	2024-05-28 16:30:50 -07:00
Wang,Zhe	fd5971be0b	support ollama run on Intel GPUs	2024-05-24 11:18:27 +08:00
Daniel Hiltgen	c48c1d7c46	Port cuda/rocm skip build vars to linux Windows already implements these, carry over to linux.	2024-05-15 15:56:43 -07:00
Roy Yang	5f73c08729	Remove trailing spaces (#3889 )	2024-04-25 14:32:26 -04:00
Daniel Hiltgen	cc5a71e0e3	Merge pull request #3709 from remy415/custom-gpu-defs Adds support for customizing GPU build flags in llama.cpp	2024-04-23 09:28:34 -07:00
Jeremy	440b7190ed	Update gen_linux.sh Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS	2024-04-18 19:18:10 -04:00
Jeremy	52f5370c48	add support for custom gpu build flags for llama.cpp	2024-04-17 16:00:48 -04:00
Jeremy	7c000ec3ed	adds support for OLLAMA_CUSTOM_GPU_DEFS to customize GPU build flags	2024-04-17 15:21:05 -04:00
Jeremy	8aec92fa6d	rearranged conditional logic for static build, dockerfile updated	2024-04-17 14:43:28 -04:00
Jeremy	70261b9bb6	move static build to its own flag	2024-04-17 13:04:28 -04:00
Blake Mizerany	1524f323a3	Revert "build.go: introduce a friendlier way to build Ollama (#3548 )" (#3564 )	2024-04-09 15:57:45 -07:00
Blake Mizerany	fccf3eecaa	build.go: introduce a friendlier way to build Ollama (#3548 ) This commit introduces a more friendly way to build Ollama dependencies and the binary without abusing `go generate` and removing the unnecessary extra steps it brings with it. This script also provides nicer feedback to the user about what is happening during the build process. At the end, it prints a helpful message to the user about what to do next (e.g. run the new local Ollama).	2024-04-09 14:18:47 -07:00
Jeffrey Morgan	63efa075a0	update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528 )	2024-04-07 19:29:51 -04:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Jeremy	dfc6721b20	add support for libcudart.so for CUDA devices (adds Jetson support)	2024-03-25 11:07:44 -04:00
Daniel Hiltgen	d4c10df2b0	Add Radeon gfx940-942 GPU support	2024-03-15 15:34:58 -07:00
Daniel Hiltgen	bc13da2bfe	Avoid rocm runner and dependency clash Putting the rocm symlink next to the runners is risky. This moves the payloads into a subdir to avoid potential clashes.	2024-03-11 09:33:22 -07:00
Daniel Hiltgen	3dc1bb6a35	Harden for deps file being empty (or short)	2024-03-10 14:45:38 -07:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
Daniel Hiltgen	6d84f07505	Detect AMD GPU info via sysfs and block old cards This wires up some new logic to start using sysfs to discover AMD GPU information and detects old cards we can't yet support so we can fallback to CPU mode.	2024-02-12 08:19:41 -08:00
mraiser	4c4c730a0a	Merge branch 'ollama:main' into main	2024-01-27 21:56:11 -05:00
Daniel Hiltgen	e02ecfb6c8	Merge pull request #2116 from dhiltgen/cc_50_80 Add support for CUDA 5.0 cards	2024-01-27 10:28:38 -08:00
mraiser	a4564232a4	Update gen_linux.sh to find libcudart in separate directory	2024-01-25 09:49:35 -05:00
Daniel Hiltgen	df54c723ae	Make CPU builds parallel and customizable AMD GPUs The linux build now support parallel CPU builds to speed things up. This also exposes AMD GPU targets as an optional setting for advaced users who want to alter our default set.	2024-01-21 15:12:21 -08:00
Daniel Hiltgen	a447a083f2	Add compute capability 5.0, 7.5, and 8.0	2024-01-20 14:24:05 -08:00
Daniel Hiltgen	681a914990	Add support for CUDA 5.2 cards	2024-01-20 10:48:43 -08:00
Daniel Hiltgen	fccdf4c635	Merge pull request #1987 from xyproto/archlinux Let gpu.go and gen_linux.sh also find CUDA on Arch Linux	2024-01-18 13:32:10 -08:00
Daniel Hiltgen	1b249748ab	Add multiple CPU variants for Intel Mac This also refines the build process for the ext_server build.	2024-01-17 15:08:54 -08:00
Alexander F. Rødseth	cbe2adc78a	Merge branch 'main' into archlinux	2024-01-17 12:50:11 +01:00
Daniel Hiltgen	795674dd90	Bump llama.cpp to b1842 and add new cuda lib dep Upstream llama.cpp has added a new dependency with the NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the driver distribution, not the general cuda libraries, and is not available as an archive, so we can not statically link it. This may introduce some additional compatibility challenges which we'll need to keep an eye on.	2024-01-16 12:53:52 -08:00
Alexander F. Rødseth	f4bf1d514f	Let gpu.go and gen_linux.sh also find CUDA on Arch Linux	2024-01-14 13:40:36 +01:00
Fabian Preiss	905862e17b	improve cuda detection (rel. issue #1704 )	2024-01-12 21:59:19 +01:00
Daniel Hiltgen	39928a42e8	Always dynamically load the llm server library This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform	2024-01-11 08:42:47 -08:00
Daniel Hiltgen	d88c527be3	Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available	2024-01-11 08:42:47 -08:00
Daniel Hiltgen	8da7bef05f	Support multiple variants for a given llm lib type In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.	2024-01-10 17:27:51 -08:00
Daniel Hiltgen	77d96da94b	Code shuffle to clean up the llm dir	2024-01-04 12:12:05 -08:00

37 commits