ollama

Author	SHA1	Message	Date
jmorganca	ddf5c09a9b	use matrix multiplcation kernels in more cases	2024-04-25 13:58:54 -07:00
Roy Yang	5f73c08729	Remove trailing spaces (#3889 )	2024-04-25 14:32:26 -04:00
Daniel Hiltgen	6e76348df7	Merge pull request #3834 from dhiltgen/not_found_in_path Report errors on server lookup instead of path lookup failure	2024-04-24 10:50:48 -07:00
Patrick Devine	14476d48cc	fixes for gguf (#3863 )	2024-04-23 20:57:20 -07:00
Daniel Hiltgen	5445aaa94e	Add back memory escape valve If we get our predictions wrong, this can be used to set a lower memory limit as a workaround. Recent multi-gpu refactoring accidentally removed it, so this adds it back.	2024-04-23 17:09:02 -07:00
Daniel Hiltgen	058f6cd2cc	Move nested payloads to installer and zip file on windows Now that the llm runner is an executable and not just a dll, more users are facing problems with security policy configurations on windows that prevent users writing to directories and then executing binaries from the same location. This change removes payloads from the main executable on windows and shifts them over to be packaged in the installer and discovered based on the executables location. This also adds a new zip file for people who want to "roll their own" installation model.	2024-04-23 16:14:47 -07:00
Daniel Hiltgen	58888a74bc	Detect and recover if runner removed Tmp cleaners can nuke the file out from underneath us. This detects the missing runner, and re-initializes the payloads.	2024-04-23 10:05:26 -07:00
Daniel Hiltgen	cc5a71e0e3	Merge pull request #3709 from remy415/custom-gpu-defs Adds support for customizing GPU build flags in llama.cpp	2024-04-23 09:28:34 -07:00
Michael Yang	e83bcf7f9a	Merge pull request #3836 from ollama/mxyng/mixtral fix: mixtral graph	2024-04-23 09:15:10 -07:00
Daniel Hiltgen	34b9db5afc	Request and model concurrency This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.	2024-04-22 19:29:12 -07:00
Daniel Hiltgen	8711d03df7	Report errors on server lookup instead of path lookup failure	2024-04-22 19:08:47 -07:00
Michael Yang	435cc866a3	fix: mixtral graph	2024-04-22 17:19:44 -07:00
Daniel Hiltgen	aa72281eae	Trim spaces and quotes from llm lib override	2024-04-22 17:11:14 -07:00
Jeremy	9c0db4cc83	Update gen_windows.ps1 Fixed improper env references	2024-04-21 16:13:41 -04:00
Cheng	62be2050dd	chore: use errors.New to replace fmt.Errorf will much better (#3789 )	2024-04-20 22:11:06 -04:00
Jeremy	6f18297b3a	Update gen_windows.ps1 Forgot a " on the write-host	2024-04-18 19:47:44 -04:00
Jeremy	15016413de	Update gen_windows.ps1 Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS to customize GPU builds on Windows	2024-04-18 19:27:16 -04:00
Jeremy	440b7190ed	Update gen_linux.sh Added OLLAMA_CUSTOM_CUDA_DEFS and OLLAMA_CUSTOM_ROCM_DEFS instead of OLLAMA_CUSTOM_GPU_DEFS	2024-04-18 19:18:10 -04:00
Jeremy	3934c15895	Merge branch 'ollama:main' into custom-gpu-defs	2024-04-18 09:55:10 -04:00
Jeremy	fd048f1367	Merge branch 'ollama:main' into arm64static	2024-04-18 09:55:04 -04:00
Michael Yang	8645076a71	Merge pull request #3712 from ollama/mxyng/mem add stablelm graph calculation	2024-04-17 15:57:51 -07:00
Michael Yang	05e9424824	Merge pull request #3664 from ollama/mxyng/fix-padding-2 fix padding to only return padding	2024-04-17 15:57:40 -07:00
Michael Yang	3cf483fe48	add stablelm graph calculation	2024-04-17 13:57:19 -07:00
Jeremy	52f5370c48	add support for custom gpu build flags for llama.cpp	2024-04-17 16:00:48 -04:00
Jeremy	7c000ec3ed	adds support for OLLAMA_CUSTOM_GPU_DEFS to customize GPU build flags	2024-04-17 15:21:05 -04:00
Jeremy	ea4c284a48	Merge branch 'ollama:main' into arm64static	2024-04-17 15:11:38 -04:00
Jeremy	8aec92fa6d	rearranged conditional logic for static build, dockerfile updated	2024-04-17 14:43:28 -04:00
Michael Yang	a8b9b930b4	account for all non-repeating layers	2024-04-17 11:21:21 -07:00
Jeremy	70261b9bb6	move static build to its own flag	2024-04-17 13:04:28 -04:00
Michael Yang	e74163af4c	fix padding to only return padding	2024-04-16 15:43:26 -07:00
Michael Yang	26df674785	scale graph based on gpu count	2024-04-16 14:44:13 -07:00
Jeffrey Morgan	7c9792a6e0	Support unicode characters in model path (#3681 ) * parse wide argv characters on windows * cleanup * move cleanup to end of `main`	2024-04-16 17:00:12 -04:00
Michael Yang	41a272de9f	darwin: no partial offloading if required memory greater than system	2024-04-16 11:22:38 -07:00
Jeffrey Morgan	f335722275	update llama.cpp submodule to `7593639` (#3665 )	2024-04-15 23:04:43 -04:00
Michael Yang	6d53b67c2c	Merge pull request #3663 from ollama/mxyng/fix-padding	2024-04-15 17:44:54 -07:00
Michael Yang	969238b19e	fix padding in decode TODO: update padding() to _only_ returning the padding	2024-04-15 17:27:06 -07:00
Patrick Devine	9f8691c6c8	Add llama2 / torch models for `ollama create` (#3607 )	2024-04-15 11:26:42 -07:00
Jeffrey Morgan	a0b8a32eb4	Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653 ) * terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading * use `unload` in signal handler	2024-04-15 12:09:32 -04:00
Jeffrey Morgan	309aef7fee	update llama.cpp submodule to `4bd0f93` (#3627 )	2024-04-13 10:43:02 -07:00
Michael Yang	3397eff0cd	mixtral mem	2024-04-11 11:10:41 -07:00
Michael Yang	7e33a017c0	partial offloading	2024-04-10 11:37:20 -07:00
Michael Yang	8b2c10061c	refactor tensor query	2024-04-10 11:37:20 -07:00
Daniel Hiltgen	c5ff443b9f	Handle very slow model loads During testing, we're seeing some models take over 3 minutes.	2024-04-09 16:35:10 -07:00
Blake Mizerany	1524f323a3	Revert "build.go: introduce a friendlier way to build Ollama (#3548 )" (#3564 )	2024-04-09 15:57:45 -07:00
Blake Mizerany	fccf3eecaa	build.go: introduce a friendlier way to build Ollama (#3548 ) This commit introduces a more friendly way to build Ollama dependencies and the binary without abusing `go generate` and removing the unnecessary extra steps it brings with it. This script also provides nicer feedback to the user about what is happening during the build process. At the end, it prints a helpful message to the user about what to do next (e.g. run the new local Ollama).	2024-04-09 14:18:47 -07:00
Michael Yang	c77d45d836	Merge pull request #3506 from ollama/mxyng/quantize-redux cgo quantize	2024-04-09 12:32:53 -07:00
Jeffrey Morgan	5ec12cec6c	update llama.cpp submodule to `1b67731` (#3561 )	2024-04-09 15:10:17 -04:00
Michael Yang	9502e5661f	cgo quantize	2024-04-08 15:31:08 -07:00
Jeffrey Morgan	63efa075a0	update generate scripts with new `LLAMA_CUDA` variable, set `HIP_PLATFORM` to avoid compiler errors (#3528 )	2024-04-07 19:29:51 -04:00
Michael Yang	be517e491c	no rope parameters	2024-04-05 18:05:27 -07:00

1 2 3 4 5 ...

399 commits