ollama

Author	SHA1	Message	Date
Bryce Reitano	ceb0e26e5e	Provide variable ggml for TestLoad	2024-04-24 17:19:55 -06:00
Bryce Reitano	284e02bed0	Move ggml loading to when we attempt fitting	2024-04-24 17:17:24 -06:00
Michael Yang	592dae31c8	update copy to use model.Name	2024-04-24 15:54:54 -07:00
Daniel Hiltgen	d8851cb7a0	Harden sched TestLoad Give the go routine a moment to deliver the expired event	2024-04-23 16:14:47 -07:00
Daniel Hiltgen	34b9db5afc	Request and model concurrency This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.	2024-04-22 19:29:12 -07:00
Cheng	62be2050dd	chore: use errors.New to replace fmt.Errorf will much better (#3789 )	2024-04-20 22:11:06 -04:00
Patrick Devine	9f8691c6c8	Add llama2 / torch models for `ollama create` (#3607 )	2024-04-15 11:26:42 -07:00
Jeffrey Morgan	a0b8a32eb4	Terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading (#3653 ) * terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading * use `unload` in signal handler	2024-04-15 12:09:32 -04:00
Blake Mizerany	a7b431e743	server: provide helpful workaround hint when stalling on pull (#3584 ) This is a quick fix to help users who are stuck on the "pull" step at 99%. In the near future we're introducing a new registry client that should/will hopefully be smarter. In the meantime, this should unblock the users hitting issue #1736.	2024-04-10 16:24:37 -07:00
Michael Yang	9502e5661f	cgo quantize	2024-04-08 15:31:08 -07:00
Michael Yang	e1c9a2a00f	no blob create if already exists	2024-04-08 15:09:48 -07:00
Daniel Hiltgen	6589eb8a8c	Revert options as a ref in the server	2024-04-02 16:44:10 -07:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Patrick Devine	3b6a9154dd	Simplify model conversion (#3422 )	2024-04-01 16:14:53 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	d338d70492	refactor model parsing	2024-04-01 13:16:15 -07:00
Patrick Devine	5a5efee46b	Add gemma safetensors conversion (#3250 ) Co-authored-by: Michael Yang <mxyng@pm.me>	2024-03-28 18:54:01 -07:00
Michael Yang	af8a8a6b59	fix: trim quotes on OLLAMA_ORIGINS	2024-03-27 15:24:29 -07:00
Patrick Devine	1b272d5bcd	change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347 )	2024-03-26 13:04:17 -07:00
Daniel Hiltgen	949b6c01e0	Revamp go based integration tests This uplevels the integration tests to run the server which can allow testing an existing server, or a remote server.	2024-03-23 14:24:18 +01:00
Blake Mizerany	703684a82a	server: replace blob prefix separator from ':' to '-' (#3146 ) This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.	2024-03-14 20:18:06 -07:00
Patrick Devine	47cfe58af5	Default Keep Alive environment variable (#3094 ) --------- Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>	2024-03-13 13:29:40 -07:00
Daniel Hiltgen	4a5c9b8035	Finish unwinding idempotent payload logic The recent ROCm change partially removed idempotent payloads, but the ggml-metal.metal file for mac was still idempotent. This finishes switching to always extract the payloads, and now that idempotentcy is gone, the version directory is no longer useful.	2024-03-09 08:34:39 -08:00
Jeffrey Morgan	5b3fad9636	separate out `isLocalIP`	2024-03-09 00:22:08 -08:00
Jeffrey Morgan	bfec2c6e10	simplify host checks	2024-03-08 23:29:53 -08:00
Jeffrey Morgan	5c143af726	add additional allowed hosts	2024-03-08 23:23:59 -08:00
Jeffrey Morgan	fc8c044584	add allowed host middleware and remove `workDir` middleware (#3018 )	2024-03-08 22:23:47 -08:00
Michael Yang	76bdebbadf	decode ggla	2024-03-08 15:46:25 -08:00
Bruce MacDonald	0cebc79cba	fix: allow importing a model from name reference (#3005 )	2024-03-08 12:27:47 -05:00
Jeffrey Morgan	fc06205971	Revert "adjust download and upload concurrency based on available bandwidth" (#2995 )	2024-03-07 18:10:16 -08:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
Michael Yang	2e20110e50	Merge pull request #2221 from ollama/mxyng/up-down-ccy adjust download and upload concurrency based on available bandwidth	2024-03-07 09:27:33 -08:00
Patrick Devine	2c017ca441	Convert Safetensors to an Ollama model (#2824 )	2024-03-06 21:01:51 -08:00
Jeffrey Morgan	3b4bab3dc5	Fix embeddings load model behavior (#2848 )	2024-02-29 17:40:56 -08:00
Michael Yang	0e19476b56	prepend image tags (#2789 ) instead of appending image tags, prepend them - this generally produces better results	2024-02-29 11:30:14 -08:00
Michael Yang	084d846621	refactor	2024-02-21 13:42:48 -08:00
Michael Yang	6a4b994433	lint	2024-02-21 13:42:48 -08:00
Michael Yang	bea007deb7	use LimitGroup for uploads	2024-02-21 13:42:48 -08:00
Michael Yang	074934be03	adjust group limit based on download speed	2024-02-21 13:42:48 -08:00
Michael Yang	0de12368a0	add new LimitGroup for dynamic concurrency	2024-02-21 13:42:48 -08:00
Michael Yang	917bd61084	refactor download run	2024-02-21 13:42:46 -08:00
Jeffrey Morgan	287ba11500	better error message when calling `/api/generate` or `/api/chat` with embedding models	2024-02-20 21:53:45 -05:00
Jeffrey Morgan	63861f58cc	Support for `bert` and `nomic-bert` embedding models	2024-02-20 21:37:29 -05:00
Michael Yang	210b65268e	replace strings buffer with hasher (#2437 ) the buffered value is going into the hasher eventually so write directly to the hasher instead	2024-02-20 19:07:50 -05:00
Michael Yang	897b213468	use http.DefaultClient (#2530 ) default client already handles proxy	2024-02-20 18:34:47 -05:00
Bruce MacDonald	88622847c6	fix: chat system prompting overrides (#2542 )	2024-02-16 14:42:43 -05:00
Michael Yang	e43648afe5	rerefactor	2024-02-15 05:56:45 +00:00
Daniel Hiltgen	f397e0e988	Move hub auth out to new package	2024-02-15 05:56:45 +00:00
Jeffrey Morgan	48a273f80b	Fix issues with templating prompt in chat mode (#2460 )	2024-02-12 15:06:57 -08:00
Jeffrey Morgan	1f9078d6ae	Check image filetype in api handlers (#2467 )	2024-02-12 11:16:20 -08:00
Jeffrey Morgan	a0a199b108	Fix hanging issue when sending empty content (#2399 )	2024-02-07 19:30:33 -05:00
Jeffrey Morgan	453f572f83	Initial OpenAI `/v1/chat/completions` API compatibility (#2376 )	2024-02-07 17:24:29 -05:00
Michael Yang	e805ac1d59	fix response on token error	2024-02-07 11:05:49 -08:00
Michael Yang	bfbf2f7cf7	Merge pull request #2296 from ollama/mxyng/img-tags append image tags to user content	2024-02-01 13:16:59 -08:00
Michael Yang	3d6f48507a	structured debug prompt	2024-02-01 11:56:28 -08:00
Michael Yang	f3761405c8	use image id	2024-02-01 11:52:42 -08:00
Michael Yang	e49dc9f3d8	fix tests	2024-02-01 11:48:11 -08:00
Michael Yang	d125510b4b	remove image tags	2024-02-01 11:32:51 -08:00
Michael Yang	fb56988014	account for image projection in token count	2024-02-01 09:50:48 -08:00
Michael Yang	d046bee790	use llm.ImageData for chat	2024-01-31 19:18:25 -08:00
Jeffrey Morgan	f11bf0740b	use `llm.ImageData`	2024-01-31 19:13:48 -08:00
Michael Yang	8450bf66e6	trim images	2024-01-31 19:13:47 -08:00
Michael Yang	b4e11be8ef	append image tags to user content	2024-01-31 19:13:10 -08:00
Bruce MacDonald	a896079705	preserve last system message from modelfile (#2289 )	2024-01-31 21:45:01 -05:00
Michael Yang	8ac08a0eec	update slog handler options - consistent format by using text handler for debug and non-debug - truncate source file to just the file name	2024-01-31 15:15:00 -08:00
Michael Yang	c8b1f2369e	remove unnecessary parse raw	2024-01-30 17:00:53 -08:00
Bruce MacDonald	0632dff3f8	trim chat prompt based on llm context size (#1963 )	2024-01-30 15:59:29 -05:00
Jeffrey Morgan	f2245c7c77	print prompt with `OLLAMA_DEBUG=1` (#2245 )	2024-01-28 15:22:35 -08:00
Jeffrey Morgan	e4b9b72f2a	Do not repeat system prompt for chat templating (#2241 )	2024-01-28 14:15:56 -08:00
Patrick Devine	b5cf31b460	add keep_alive to generate/chat/embedding api endpoints (#2146 )	2024-01-26 14:28:02 -08:00
Michael Yang	9d3dcfd0ec	fix logging	2024-01-26 11:04:27 -08:00
Michael Yang	6e0ea5ecc8	Merge pull request #1916 from ollama/mxyng/inactivity-monitor download: add inactivity monitor	2024-01-26 10:56:00 -08:00
Patrick Devine	7c40a67841	Save and load sessions (#2063 )	2024-01-25 12:12:36 -08:00
Michael Yang	c08dfaa23d	fix: remove overwritten model layers if create overrides a manifest, first add the older manifest's layers to the delete map so they can be cleaned up	2024-01-19 14:58:37 -08:00
Michael Yang	aac9ab4db7	fix show handler	2024-01-18 15:36:50 -08:00
Michael Yang	745b5934fa	add model to ModelResponse	2024-01-18 14:32:55 -08:00
Michael Yang	a38d88d828	api: add model for all requests prefer using req.Model and fallback to req.Name	2024-01-18 14:31:37 -08:00
Daniel Hiltgen	fedd705aea	Mechanical switch from log to slog A few obvious levels were adjusted, but generally everything mapped to "info" level.	2024-01-18 14:12:57 -08:00
Michael Yang	96cfb62641	fix: normalize name path before splitting	2024-01-16 16:48:29 -08:00
Patrick Devine	eef50accb4	Fix show parameters (#2017 )	2024-01-16 10:34:44 -08:00
Michael Yang	27331ae3a8	download: add inactivity monitor if a download part is inactive for some time, restart it	2024-01-12 15:23:15 -08:00
Michael Yang	cf29bd2d72	fix: request retry with error this fixes a subtle bug with makeRequestWithRetry where an HTTP status error on a retried request will potentially not return the right err	2024-01-12 13:32:27 -08:00
Michael Yang	2b9892a808	fix(windows): modelpath and list	2024-01-09 09:36:58 -08:00
Michael Yang	2bb2bdd5d4	fix lint	2024-01-09 09:36:58 -08:00
Michael Yang	acfc376efd	add .golangci.yaml	2024-01-09 09:36:58 -08:00
Bruce MacDonald	7e8f7c8358	remove ggml automatic re-pull (#1856 )	2024-01-08 14:41:01 -05:00
Michael Yang	0101e76dbe	Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05 fix: allow extension origins (still needs explicit listing), fixes #1686	2024-01-05 17:20:09 -08:00
Patrick Devine	22e93efa41	add show info command and fix the modelfile	2024-01-05 12:20:05 -08:00
Nicholas Dudfield	8baaaa39c0	Allow extension origins (still needs explicit listing), fixes #1686	2024-01-05 09:06:47 +07:00
Bruce MacDonald	4ad6c9b11f	fix: pull either original model or from model on create (#1774 )	2024-01-04 01:34:38 -05:00
Bruce MacDonald	0b3118e0af	fix: relay request opts to loaded llm prediction (#1761 )	2024-01-03 12:01:42 -05:00
Daniel Hiltgen	697bea6939	Guard integration tests with a tag This should help CI avoid running the integration test logic in a container where it's not currently possible.	2023-12-22 16:33:27 -08:00
Bruce MacDonald	db356c8519	post-response templating (#1427 )	2023-12-22 17:07:05 -05:00
Daniel Hiltgen	96fb441abd	Merge pull request #1146 from dhiltgen/ext_server_cgo Add cgo implementation for llama.cpp	2023-12-22 08:16:31 -08:00
Michael Yang	63aac0edc5	fix(test): use real version string for comparison	2023-12-19 15:03:02 -08:00
Daniel Hiltgen	51082535e1	Add automated test for multimodal A simple test case that verifies llava:7b can read text in an image	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	35934b2e05	Adapted rocm support to cgo based llama.cpp	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	d4cd695759	Add cgo implementation for llama.cpp Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.	2023-12-19 09:05:46 -08:00
Bruce MacDonald	5e7fd6906f	Update images.go	2023-12-19 09:05:46 -08:00
Bruce MacDonald	811b1f03c8	deprecate ggml - remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>	2023-12-19 09:05:46 -08:00

1 2 3 4 5 ...

529 commits