ollama

Author	SHA1	Message	Date
Michael Yang	2ae0556292	Merge pull request #1679 from ollama/mxyng/build-gpus build cuda and rocm	2024-01-25 16:38:14 -08:00
Jeffrey Morgan	5be9bdd444	Update modelfile.md	2024-01-25 16:29:48 -08:00
Jeffrey Morgan	b706794905	Update modelfile.md to include `MESSAGE`	2024-01-25 16:29:32 -08:00
Michael Yang	a8c5413d06	only generate gpu libs	2024-01-25 15:41:56 -08:00
Michael Yang	5580de4571	archive ollama binaries	2024-01-25 15:40:16 -08:00
Michael Yang	946431d5b0	build cuda and rocm	2024-01-25 15:40:15 -08:00
Michael Yang	0610126049	remove env setting	2024-01-25 15:39:43 -08:00
Jeffrey Morgan	3ebd6a83fc	update submodule to `cd4fddb29f81d6a1f6d51a0c016bc6b486d68def`	2024-01-25 13:54:11 -08:00
Jeffrey Morgan	a64570dcae	Fix clearing kv cache between requests with the same prompt (#2186 ) * Fix clearing kv cache between requests with the same prompt * fix powershell script	2024-01-25 13:46:20 -08:00
Patrick Devine	7c40a67841	Save and load sessions (#2063 )	2024-01-25 12:12:36 -08:00
Michael Yang	e64b5b07a2	Merge pull request #2181 from ollama/mxyng/stub-lint stub generate outputs for lint	2024-01-25 11:55:15 -08:00
Michael Yang	9e1e295cdc	Merge pull request #2175 from ollama/mxyng/refactor-tensor-read refactor tensor read	2024-01-25 09:22:42 -08:00
Marc Raiser	6eb3cddcb6	To build on NixOS: nix-shell --run 'go generate ./... && go build .'	2024-01-25 10:17:22 -05:00
mraiser	a4564232a4	Update gen_linux.sh to find libcudart in separate directory	2024-01-25 09:49:35 -05:00
Jeffrey Morgan	a643823f86	Update README.md	2024-01-24 21:36:56 -08:00
Michael Yang	8e5d359a03	stub generate outputs for lint	2024-01-24 17:36:10 -08:00
Daniel Hiltgen	a170888dd4	Merge pull request #2174 from dhiltgen/rocm_real_gpus More logging for gpu management	2024-01-24 11:09:17 -08:00
Michael Yang	cd22855ef8	refactor tensor read	2024-01-24 10:48:31 -08:00
Daniel Hiltgen	013fd07139	More logging for gpu management Fix an ordering glitch of dlerr/dlclose and add more logging to help root cause some crashes users are hitting. This also refines the function pointer names to use the underlying function names instead of simplified names for readability.	2024-01-24 10:32:36 -08:00
Daniel Hiltgen	f63dc2db5c	Merge pull request #2162 from dhiltgen/rocm_real_gpus Report more information about GPUs in verbose mode	2024-01-23 17:45:40 -08:00
Jeffrey Morgan	eaa5a396d9	Update README.md	2024-01-23 16:08:15 -08:00
Jeffrey Morgan	8ed22f5d72	Update README.md	2024-01-23 14:38:01 -08:00
Daniel Hiltgen	987c16b2f7	Report more information about GPUs in verbose mode This adds additional calls to both CUDA and ROCm management libraries to discover additional attributes about the GPU(s) detected in the system, and wires up runtime verbosity selection. When users hit problems with GPUs we can ask them to run with `OLLAMA_DEBUG=1 ollama serve` and share the results.	2024-01-23 11:37:02 -08:00
Jeffrey Morgan	950f636d64	Update README.md	2024-01-23 10:29:10 -08:00
Jeffrey Morgan	4458efb73a	Load all layers on `arm64` macOS if model is small enough (#2149 )	2024-01-22 17:40:06 -08:00
Daniel Hiltgen	ceea599494	Merge pull request #2150 from dhiltgen/default_version Set a default version using git describe	2024-01-22 17:38:27 -08:00
Daniel Hiltgen	3005ec74b3	Set a default version using git describe If a VERSION is not specified, this will generate a version string that represents the state of the repo. For example `0.1.21-12-gffaf52e-dirty` representing 12 commits away from 0.1.21 tag, on commit gffaf52e and the tree is dirty.	2024-01-22 17:12:20 -08:00
Daniel Hiltgen	0759d8996e	Merge pull request #2148 from dhiltgen/intel_mac Refine Accelerate usage on mac	2024-01-22 16:56:58 -08:00
Daniel Hiltgen	0f5b843319	Refine Accelerate usage on mac For old macs, accelerate seems to cause crashes, but for AVX2 capable macs, it does not.	2024-01-22 16:25:56 -08:00
Jeffrey Morgan	ffaf52e1e9	update submodule to `011e8ec577fd135cbc02993d3ea9840c516d6a1c`	2024-01-22 15:16:54 -08:00
Michael Yang	940b10b036	Merge pull request #2144 from jmorganca/mxyng/update-faq faq: update to use launchctl setenv	2024-01-22 13:46:57 -08:00
Daniel Hiltgen	3bc28736cd	Merge pull request #2143 from dhiltgen/llm_verbosity Refine debug logging for llm	2024-01-22 13:19:16 -08:00
Michael Yang	93a756266c	faq: update to use launchctl setenv	2024-01-22 13:10:13 -08:00
Daniel Hiltgen	a0a829bf7a	Merge pull request #2142 from dhiltgen/debug_on_fail Debug logging on init failure	2024-01-22 12:29:22 -08:00
Daniel Hiltgen	730dcfcc7a	Refine debug logging for llm This wires up logging in llama.cpp to always go to stderr, and also turns up logging if OLLAMA_DEBUG is set.	2024-01-22 12:26:49 -08:00
Daniel Hiltgen	27a2d5af54	Debug logging on init failure	2024-01-22 12:08:22 -08:00
Jeffrey Morgan	5f81a33f43	update submodule to `6f9939d` (#2115 )	2024-01-22 11:56:40 -08:00
Michael Yang	6225fde046	Merge pull request #2102 from jmorganca/mxyng/fix-create-override fix: remove overwritten model layers	2024-01-22 09:37:48 -08:00
Meng Zhuo	069184562b	readline: drop not use min function (#2134 )	2024-01-22 08:15:08 -08:00
Daniel Hiltgen	5576bb2348	Merge pull request #2130 from dhiltgen/more_faster Make CPU builds parallel and customizable AMD GPUs	2024-01-21 16:14:12 -08:00
Daniel Hiltgen	2738837786	Merge pull request #2131 from dhiltgen/probe_cards_at_init Probe GPUs before backend init	2024-01-21 16:13:47 -08:00
Daniel Hiltgen	ec3764538d	Probe GPUs before backend init Detect potential error scenarios so we can fallback to CPU mode without hitting asserts.	2024-01-21 15:59:38 -08:00
Daniel Hiltgen	df54c723ae	Make CPU builds parallel and customizable AMD GPUs The linux build now support parallel CPU builds to speed things up. This also exposes AMD GPU targets as an optional setting for advaced users who want to alter our default set.	2024-01-21 15:12:21 -08:00
Daniel Hiltgen	fa8c990e58	Merge pull request #2127 from dhiltgen/rocm_container Combine the 2 Dockerfiles and add ROCm	2024-01-21 11:49:01 -08:00
Daniel Hiltgen	da72235ebf	Combine the 2 Dockerfiles and add ROCm This renames Dockerfile.build to Dockerfile, and adds some new stages to support 2 modes of building - the build_linux.sh script uses intermediate stages to extract the artifacts for ./dist, and the default build generates a container image usable by both cuda and rocm cards. This required transitioniing the x86 base to the rocm image to avoid layer bloat.	2024-01-21 11:37:11 -08:00
Jeffrey Morgan	89c4aee29e	Unlock mutex when failing to load model (#2117 )	2024-01-20 20:54:46 -05:00
Daniel Hiltgen	a447a083f2	Add compute capability 5.0, 7.5, and 8.0	2024-01-20 14:24:05 -08:00
Jeffrey Morgan	f32ea81b21	increase minimum overhead to 1024MiB (#2114 )	2024-01-20 17:11:38 -05:00
Daniel Hiltgen	681a914990	Add support for CUDA 5.2 cards	2024-01-20 10:48:43 -08:00
Jeffrey Morgan	4c54f0ddeb	sign dylibs on macOS (#2101 )	2024-01-19 19:24:11 -05:00

... 31 32 33 34 35 ...

3491 commits