ollama

Author	SHA1	Message	Date
Daniel Hiltgen	4072b5879b	Merge pull request #2246 from dhiltgen/reject_cuda_without_avx Don't disable GPUs on arm without AVX	2024-01-28 16:26:55 -08:00
Daniel Hiltgen	15562e887d	Don't disable GPUs on arm without AVX AVX is an x86 feature, so ARM should be excluded from the check.	2024-01-28 15:22:38 -08:00
Jeffrey Morgan	f2245c7c77	print prompt with `OLLAMA_DEBUG=1` (#2245 )	2024-01-28 15:22:35 -08:00
Jeffrey Morgan	e4b9b72f2a	Do not repeat system prompt for chat templating (#2241 )	2024-01-28 14:15:56 -08:00
Daniel Hiltgen	311f8e0c3f	Merge pull request #2243 from dhiltgen/harden_zero_gpus Harden for zero detected GPUs	2024-01-28 13:30:44 -08:00
Daniel Hiltgen	f07f8b7a9e	Harden for zero detected GPUs At least with the ROCm libraries, its possible to have the library present with zero GPUs. This fix avoids a divide by zero bug in llm.go when we try to calculate GPU memory with zero GPUs.	2024-01-28 13:13:10 -08:00
mraiser	4c4c730a0a	Merge branch 'ollama:main' into main	2024-01-27 21:56:11 -05:00
Daniel Hiltgen	e02ecfb6c8	Merge pull request #2116 from dhiltgen/cc_50_80 Add support for CUDA 5.0 cards	2024-01-27 10:28:38 -08:00
Daniel Hiltgen	c8059b4dcf	Merge pull request #2224 from jaglinux/fix_rocm_get_version_message ROCm: Correct the response string in rocm_get_version function	2024-01-27 07:29:32 -08:00
Jagadish Krishnamoorthy	59d87127f5	Update gpu_info_rocm.c	2024-01-26 22:08:27 -08:00
Patrick Devine	b5cf31b460	add keep_alive to generate/chat/embedding api endpoints (#2146 )	2024-01-26 14:28:02 -08:00
Daniel Hiltgen	cc4915e262	Merge pull request #2214 from dhiltgen/reject_cuda_without_avx Detect lack of AVX and fallback to CPU mode	2024-01-26 12:06:44 -08:00
Daniel Hiltgen	667a2ba18a	Detect lack of AVX and fallback to CPU mode We build the GPU libraries with AVX enabled to ensure that if not all layers fit on the GPU we get better performance in a mixed mode. If the user is using a virtualization/emulation system that lacks AVX this used to result in an illegal instruction error and crash before this fix. Now we will report a warning in the server log, and just use CPU mode to ensure we don't crash.	2024-01-26 11:36:03 -08:00
Michael Yang	e054ebe059	Merge pull request #2212 from ollama/mxyng/fix-build fix build	2024-01-26 11:19:08 -08:00
Michael Yang	9d3dcfd0ec	fix logging	2024-01-26 11:04:27 -08:00
Michael Yang	6e0ea5ecc8	Merge pull request #1916 from ollama/mxyng/inactivity-monitor download: add inactivity monitor	2024-01-26 10:56:00 -08:00
Daniel Hiltgen	a47d8b2557	Merge pull request #2197 from dhiltgen/remove_rocm_image Add back ROCm container support	2024-01-26 09:34:23 -08:00
Daniel Hiltgen	30c43c285c	Merge pull request #2195 from dhiltgen/rocm_real_gpus Ignore AMD integrated GPUs	2024-01-26 09:30:24 -08:00
Daniel Hiltgen	23a7ea593b	Merge pull request #2209 from dhiltgen/harden_mgmt Fix crash on cuda ml init failure	2024-01-26 09:30:13 -08:00
Daniel Hiltgen	75c44aa319	Add back ROCm container support This adds ROCm support back as a discrete image.	2024-01-26 09:24:29 -08:00
Daniel Hiltgen	9d7b5d6c91	Ignore AMD integrated GPUs Detect and ignore integrated GPUs reported by rocm.	2024-01-26 09:21:35 -08:00
Daniel Hiltgen	5d9c4a5f5a	Fix crash on cuda ml init failure The new driver lookup code was triggering after init failure due to a missing return	2024-01-26 09:18:33 -08:00
Daniel Hiltgen	197e420a97	Merge pull request #2196 from dhiltgen/remove_rocm_image Switch back to ubuntu base	2024-01-25 16:50:32 -08:00
Daniel Hiltgen	a34e1ad3cf	Switch back to ubuntu base The size increase for rocm support in the standard image is problematic We'll revisit multiple tags for rocm support in a follow up PR.	2024-01-25 16:46:01 -08:00
Michael Yang	2ae0556292	Merge pull request #1679 from ollama/mxyng/build-gpus build cuda and rocm	2024-01-25 16:38:14 -08:00
Jeffrey Morgan	5be9bdd444	Update modelfile.md	2024-01-25 16:29:48 -08:00
Jeffrey Morgan	b706794905	Update modelfile.md to include `MESSAGE`	2024-01-25 16:29:32 -08:00
Michael Yang	a8c5413d06	only generate gpu libs	2024-01-25 15:41:56 -08:00
Michael Yang	5580de4571	archive ollama binaries	2024-01-25 15:40:16 -08:00
Michael Yang	946431d5b0	build cuda and rocm	2024-01-25 15:40:15 -08:00
Michael Yang	0610126049	remove env setting	2024-01-25 15:39:43 -08:00
Jeffrey Morgan	3ebd6a83fc	update submodule to `cd4fddb29f81d6a1f6d51a0c016bc6b486d68def`	2024-01-25 13:54:11 -08:00
Jeffrey Morgan	a64570dcae	Fix clearing kv cache between requests with the same prompt (#2186 ) * Fix clearing kv cache between requests with the same prompt * fix powershell script	2024-01-25 13:46:20 -08:00
Patrick Devine	7c40a67841	Save and load sessions (#2063 )	2024-01-25 12:12:36 -08:00
Michael Yang	e64b5b07a2	Merge pull request #2181 from ollama/mxyng/stub-lint stub generate outputs for lint	2024-01-25 11:55:15 -08:00
Michael Yang	9e1e295cdc	Merge pull request #2175 from ollama/mxyng/refactor-tensor-read refactor tensor read	2024-01-25 09:22:42 -08:00
Marc Raiser	6eb3cddcb6	To build on NixOS: nix-shell --run 'go generate ./... && go build .'	2024-01-25 10:17:22 -05:00
mraiser	a4564232a4	Update gen_linux.sh to find libcudart in separate directory	2024-01-25 09:49:35 -05:00
Jeffrey Morgan	a643823f86	Update README.md	2024-01-24 21:36:56 -08:00
Michael Yang	8e5d359a03	stub generate outputs for lint	2024-01-24 17:36:10 -08:00
Daniel Hiltgen	a170888dd4	Merge pull request #2174 from dhiltgen/rocm_real_gpus More logging for gpu management	2024-01-24 11:09:17 -08:00
Michael Yang	cd22855ef8	refactor tensor read	2024-01-24 10:48:31 -08:00
Daniel Hiltgen	013fd07139	More logging for gpu management Fix an ordering glitch of dlerr/dlclose and add more logging to help root cause some crashes users are hitting. This also refines the function pointer names to use the underlying function names instead of simplified names for readability.	2024-01-24 10:32:36 -08:00
Daniel Hiltgen	f63dc2db5c	Merge pull request #2162 from dhiltgen/rocm_real_gpus Report more information about GPUs in verbose mode	2024-01-23 17:45:40 -08:00
Jeffrey Morgan	eaa5a396d9	Update README.md	2024-01-23 16:08:15 -08:00
Jeffrey Morgan	8ed22f5d72	Update README.md	2024-01-23 14:38:01 -08:00
Daniel Hiltgen	987c16b2f7	Report more information about GPUs in verbose mode This adds additional calls to both CUDA and ROCm management libraries to discover additional attributes about the GPU(s) detected in the system, and wires up runtime verbosity selection. When users hit problems with GPUs we can ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.	2024-01-23 11:37:02 -08:00
Jeffrey Morgan	950f636d64	Update README.md	2024-01-23 10:29:10 -08:00
Jeffrey Morgan	4458efb73a	Load all layers on `arm64` macOS if model is small enough (#2149 )	2024-01-22 17:40:06 -08:00
Daniel Hiltgen	ceea599494	Merge pull request #2150 from dhiltgen/default_version Set a default version using git describe	2024-01-22 17:38:27 -08:00

... 4 5 6 7 8 ...

2164 commits