ollama

Author	SHA1	Message	Date
Daniel Hiltgen	2619850fb4	Merge pull request #3933 from dhiltgen/ci_fixes Move cuda/rocm dependency gathering into generate script	2024-04-26 07:01:24 -07:00
Daniel Hiltgen	8feb97dc0d	Move cuda/rocm dependency gathering into generate script This will make it simpler for CI to accumulate artifacts from prior steps	2024-04-25 22:38:44 -07:00
Daniel Hiltgen	4e1ff6dcbb	Merge pull request #3926 from dhiltgen/ci_fixes Fix release CI	2024-04-25 17:42:31 -07:00
Daniel Hiltgen	8589d752ac	Fix release CI download-artifact path was being used incorrectly. It is where to extract the zip not the files in the zip to extract. Default is workspace dir which is what we want, so omit it	2024-04-25 17:27:11 -07:00
Michael Yang	de4ded68b0	Merge pull request #3923 from ollama/mxyng/mem only count output tensors	2024-04-25 16:34:17 -07:00
Daniel Hiltgen	9b5a3c5991	Merge pull request #3914 from dhiltgen/mac_perf Improve mac parallel performance	2024-04-25 16:28:31 -07:00
Jeffrey Morgan	00b0699c75	Reload model if `num_gpu` changes (#3920 ) * reload model if `num_gpu` changes * dont reload on -1 * fix tests	2024-04-25 19:02:40 -04:00
Jeffrey Morgan	993cf8bf55	llm: limit generation to 10x context size to avoid run on generations (#3918 ) * llm: limit generation to 10x context size to avoid run on generations * add comment * simplify condition statement	2024-04-25 19:02:30 -04:00
Michael Yang	7bb7cb8a60	only count output tensors	2024-04-25 15:24:08 -07:00
Daniel Hiltgen	b123be5b71	Adjust context size for parallelism	2024-04-25 13:58:54 -07:00
jmorganca	ddf5c09a9b	use matrix multiplcation kernels in more cases	2024-04-25 13:58:54 -07:00
Roy Yang	5f73c08729	Remove trailing spaces (#3889 )	2024-04-25 14:32:26 -04:00
Daniel Hiltgen	f503a848c2	Merge pull request #3895 from brycereitano/shiftloading Move ggml loading to when attempting to fit	2024-04-25 09:24:08 -07:00
Bryce Reitano	36a6daccab	Restructure loading conditional chain	2024-04-24 17:37:03 -06:00
Bryce Reitano	ceb0e26e5e	Provide variable ggml for TestLoad	2024-04-24 17:19:55 -06:00
Bryce Reitano	284e02bed0	Move ggml loading to when we attempt fitting	2024-04-24 17:17:24 -06:00
Michael Yang	3450a57d4a	Merge pull request #3713 from ollama/mxyng/modelname update copy handler to use model.Name	2024-04-24 16:00:32 -07:00
Michael Yang	592dae31c8	update copy to use model.Name	2024-04-24 15:54:54 -07:00
Michael Yang	2010cbc5fa	Merge pull request #3833 from ollama/mxyng/fix-from fix: from blob	2024-04-24 15:13:47 -07:00
Michael Yang	ac0801eced	only replace if it matches command	2024-04-24 14:49:26 -07:00
Michael Yang	ad66e5b060	split temp zip files	2024-04-24 14:18:01 -07:00
Blake Mizerany	ade4b55520	types/model: make ParseName use default without question (#3886 )	2024-04-24 11:52:55 -07:00
Daniel Hiltgen	a6d62e0617	Merge pull request #3882 from dhiltgen/amd_gfx AMD gfx patch rev is hex	2024-04-24 11:07:49 -07:00
Daniel Hiltgen	6e76348df7	Merge pull request #3834 from dhiltgen/not_found_in_path Report errors on server lookup instead of path lookup failure	2024-04-24 10:50:48 -07:00
Daniel Hiltgen	0d6687f84c	AMD gfx patch rev is hex Correctly handle gfx90a discovery	2024-04-24 09:43:52 -07:00
Patrick Devine	74d2a9ef9a	add OLLAMA_KEEP_ALIVE env variable to FAQ (#3865 )	2024-04-23 21:06:51 -07:00
Patrick Devine	14476d48cc	fixes for gguf (#3863 )	2024-04-23 20:57:20 -07:00
Patrick Devine	ce8ce82567	add mixtral 8x7b model conversion (#3859 )	2024-04-23 20:17:04 -07:00
Blake Mizerany	4dc4f1be34	types/model: restrict digest hash part to a minimum of 2 characters (#3858 ) This allows users of a valid Digest to know it has a minimum of 2 characters in the hash part for use when sharding. This is a reasonable restriction as the hash part is a SHA256 hash which is 64 characters long, which is the common hash used. There is no anticipation of using a hash with less than 2 characters. Also, add MustParseDigest. Also, replace Digest.Type with Digest.Split for getting both the type and hash parts together, which is most the common case when asking for either.	2024-04-23 18:24:17 -07:00
Daniel Hiltgen	16b52331a4	Merge pull request #3857 from dhiltgen/mem_escape_valve Add back memory escape valve	2024-04-23 17:32:24 -07:00
Daniel Hiltgen	5445aaa94e	Add back memory escape valve If we get our predictions wrong, this can be used to set a lower memory limit as a workaround. Recent multi-gpu refactoring accidentally removed it, so this adds it back.	2024-04-23 17:09:02 -07:00
Daniel Hiltgen	2ac3dd6853	Merge pull request #3850 from dhiltgen/windows_packaging Move nested payloads to installer and zip file on windows	2024-04-23 16:35:20 -07:00
Daniel Hiltgen	d8851cb7a0	Harden sched TestLoad Give the go routine a moment to deliver the expired event	2024-04-23 16:14:47 -07:00
Daniel Hiltgen	058f6cd2cc	Move nested payloads to installer and zip file on windows Now that the llm runner is an executable and not just a dll, more users are facing problems with security policy configurations on windows that prevent users writing to directories and then executing binaries from the same location. This change removes payloads from the main executable on windows and shifts them over to be packaged in the installer and discovered based on the executables location. This also adds a new zip file for people who want to "roll their own" installation model.	2024-04-23 16:14:47 -07:00
Daniel Hiltgen	790cf34d17	Merge pull request #3846 from dhiltgen/missing_runner Detect and recover if runner removed	2024-04-23 13:14:12 -07:00
Michael	928d844896	adding phi-3 mini to readme adding phi-3 mini to readme	2024-04-23 13:58:31 -04:00
Daniel Hiltgen	939d6a8606	Make CI lint verbvose	2024-04-23 10:17:42 -07:00
Daniel Hiltgen	58888a74bc	Detect and recover if runner removed Tmp cleaners can nuke the file out from underneath us. This detects the missing runner, and re-initializes the payloads.	2024-04-23 10:05:26 -07:00
Daniel Hiltgen	cc5a71e0e3	Merge pull request #3709 from remy415/custom-gpu-defs Adds support for customizing GPU build flags in llama.cpp	2024-04-23 09:28:34 -07:00
Michael Yang	e83bcf7f9a	Merge pull request #3836 from ollama/mxyng/mixtral fix: mixtral graph	2024-04-23 09:15:10 -07:00
Daniel Hiltgen	5690e5ce99	Merge pull request #3418 from dhiltgen/concurrency Request and model concurrency	2024-04-23 08:31:38 -07:00
Daniel Hiltgen	f2ea8470e5	Local unicode test case	2024-04-22 19:29:12 -07:00
Daniel Hiltgen	34b9db5afc	Request and model concurrency This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.	2024-04-22 19:29:12 -07:00
Daniel Hiltgen	8711d03df7	Report errors on server lookup instead of path lookup failure	2024-04-22 19:08:47 -07:00
Daniel Hiltgen	ee448deaba	Merge pull request #3835 from dhiltgen/harden_llm_override Trim spaces and quotes from llm lib override	2024-04-22 19:06:54 -07:00
Bruce MacDonald	6e8db04716	tidy community integrations - move some popular integrations to the top of the lists	2024-04-22 17:29:08 -07:00
Bruce MacDonald	658e60cf73	Revert "stop running model on interactive exit" This reverts commit `fad00a85e5`.	2024-04-22 17:23:11 -07:00
Bruce MacDonald	4c78f028f8	Merge branch 'main' of https://github.com/ollama/ollama	2024-04-22 17:22:28 -07:00
Michael Yang	435cc866a3	fix: mixtral graph	2024-04-22 17:19:44 -07:00
Hao Wu	c7d3a558f6	docs: update README to add chat (web UI) for LLM (#3810 ) * add chat (web UI) for LLM I have used chat with llama3 in local successfully and the code is MIT licensed. * Update README.md --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2024-04-22 20:19:39 -04:00

... 4 5 6 7 8 ...

2728 commits