ollama

Author	SHA1	Message	Date
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	af8a8a6b59	fix: trim quotes on OLLAMA_ORIGINS	2024-03-27 15:24:29 -07:00
Patrick Devine	1b272d5bcd	change `github.com/jmorganca/ollama` to `github.com/ollama/ollama` (#3347 )	2024-03-26 13:04:17 -07:00
Blake Mizerany	703684a82a	server: replace blob prefix separator from ':' to '-' (#3146 ) This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.	2024-03-14 20:18:06 -07:00
Patrick Devine	47cfe58af5	Default Keep Alive environment variable (#3094 ) --------- Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>	2024-03-13 13:29:40 -07:00
Daniel Hiltgen	4a5c9b8035	Finish unwinding idempotent payload logic The recent ROCm change partially removed idempotent payloads, but the ggml-metal.metal file for mac was still idempotent. This finishes switching to always extract the payloads, and now that idempotentcy is gone, the version directory is no longer useful.	2024-03-09 08:34:39 -08:00
Jeffrey Morgan	5b3fad9636	separate out `isLocalIP`	2024-03-09 00:22:08 -08:00
Jeffrey Morgan	bfec2c6e10	simplify host checks	2024-03-08 23:29:53 -08:00
Jeffrey Morgan	5c143af726	add additional allowed hosts	2024-03-08 23:23:59 -08:00
Jeffrey Morgan	fc8c044584	add allowed host middleware and remove `workDir` middleware (#3018 )	2024-03-08 22:23:47 -08:00
Daniel Hiltgen	6c5ccb11f9	Revamp ROCm support This refines where we extract the LLM libraries to by adding a new OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already idempotenent, so this should speed up startups after the first time a new release is deployed. It also cleans up after itself. We now build only a single ROCm version (latest major) on both windows and linux. Given the large size of ROCms tensor files, we split the dependency out. It's bundled into the installer on windows, and a separate download on windows. The linux install script is now smart and detects the presence of AMD GPUs and looks to see if rocm v6 is already present, and if not, then downloads our dependency tar file. For Linux discovery, we now use sysfs and check each GPU against what ROCm supports so we can degrade to CPU gracefully instead of having llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows dynamic library loading logic to access the amdhip64.dll APIs to query the GPU information.	2024-03-07 10:36:50 -08:00
Jeffrey Morgan	3b4bab3dc5	Fix embeddings load model behavior (#2848 )	2024-02-29 17:40:56 -08:00
Michael Yang	0e19476b56	prepend image tags (#2789 ) instead of appending image tags, prepend them - this generally produces better results	2024-02-29 11:30:14 -08:00
Jeffrey Morgan	287ba11500	better error message when calling `/api/generate` or `/api/chat` with embedding models	2024-02-20 21:53:45 -05:00
Jeffrey Morgan	63861f58cc	Support for `bert` and `nomic-bert` embedding models	2024-02-20 21:37:29 -05:00
Bruce MacDonald	88622847c6	fix: chat system prompting overrides (#2542 )	2024-02-16 14:42:43 -05:00
Michael Yang	e43648afe5	rerefactor	2024-02-15 05:56:45 +00:00
Daniel Hiltgen	f397e0e988	Move hub auth out to new package	2024-02-15 05:56:45 +00:00
Jeffrey Morgan	48a273f80b	Fix issues with templating prompt in chat mode (#2460 )	2024-02-12 15:06:57 -08:00
Jeffrey Morgan	1f9078d6ae	Check image filetype in api handlers (#2467 )	2024-02-12 11:16:20 -08:00
Jeffrey Morgan	a0a199b108	Fix hanging issue when sending empty content (#2399 )	2024-02-07 19:30:33 -05:00
Jeffrey Morgan	453f572f83	Initial OpenAI `/v1/chat/completions` API compatibility (#2376 )	2024-02-07 17:24:29 -05:00
Michael Yang	bfbf2f7cf7	Merge pull request #2296 from ollama/mxyng/img-tags append image tags to user content	2024-02-01 13:16:59 -08:00
Michael Yang	3d6f48507a	structured debug prompt	2024-02-01 11:56:28 -08:00
Michael Yang	d125510b4b	remove image tags	2024-02-01 11:32:51 -08:00
Michael Yang	fb56988014	account for image projection in token count	2024-02-01 09:50:48 -08:00
Michael Yang	d046bee790	use llm.ImageData for chat	2024-01-31 19:18:25 -08:00
Jeffrey Morgan	f11bf0740b	use `llm.ImageData`	2024-01-31 19:13:48 -08:00
Michael Yang	8450bf66e6	trim images	2024-01-31 19:13:47 -08:00
Michael Yang	b4e11be8ef	append image tags to user content	2024-01-31 19:13:10 -08:00
Michael Yang	8ac08a0eec	update slog handler options - consistent format by using text handler for debug and non-debug - truncate source file to just the file name	2024-01-31 15:15:00 -08:00
Bruce MacDonald	0632dff3f8	trim chat prompt based on llm context size (#1963 )	2024-01-30 15:59:29 -05:00
Jeffrey Morgan	f2245c7c77	print prompt with `OLLAMA_DEBUG=1` (#2245 )	2024-01-28 15:22:35 -08:00
Jeffrey Morgan	e4b9b72f2a	Do not repeat system prompt for chat templating (#2241 )	2024-01-28 14:15:56 -08:00
Patrick Devine	b5cf31b460	add keep_alive to generate/chat/embedding api endpoints (#2146 )	2024-01-26 14:28:02 -08:00
Patrick Devine	7c40a67841	Save and load sessions (#2063 )	2024-01-25 12:12:36 -08:00
Michael Yang	aac9ab4db7	fix show handler	2024-01-18 15:36:50 -08:00
Michael Yang	745b5934fa	add model to ModelResponse	2024-01-18 14:32:55 -08:00
Michael Yang	a38d88d828	api: add model for all requests prefer using req.Model and fallback to req.Name	2024-01-18 14:31:37 -08:00
Daniel Hiltgen	fedd705aea	Mechanical switch from log to slog A few obvious levels were adjusted, but generally everything mapped to "info" level.	2024-01-18 14:12:57 -08:00
Patrick Devine	eef50accb4	Fix show parameters (#2017 )	2024-01-16 10:34:44 -08:00
Michael Yang	2b9892a808	fix(windows): modelpath and list	2024-01-09 09:36:58 -08:00
Michael Yang	2bb2bdd5d4	fix lint	2024-01-09 09:36:58 -08:00
Michael Yang	acfc376efd	add .golangci.yaml	2024-01-09 09:36:58 -08:00
Michael Yang	0101e76dbe	Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05 fix: allow extension origins (still needs explicit listing), fixes #1686	2024-01-05 17:20:09 -08:00
Patrick Devine	22e93efa41	add show info command and fix the modelfile	2024-01-05 12:20:05 -08:00
Nicholas Dudfield	8baaaa39c0	Allow extension origins (still needs explicit listing), fixes #1686	2024-01-05 09:06:47 +07:00
Bruce MacDonald	0b3118e0af	fix: relay request opts to loaded llm prediction (#1761 )	2024-01-03 12:01:42 -05:00
Bruce MacDonald	db356c8519	post-response templating (#1427 )	2023-12-22 17:07:05 -05:00
Daniel Hiltgen	35934b2e05	Adapted rocm support to cgo based llama.cpp	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	d4cd695759	Add cgo implementation for llama.cpp Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.	2023-12-19 09:05:46 -08:00
Bruce MacDonald	811b1f03c8	deprecate ggml - remove ggml runner - automatically pull gguf models when ggml detected - tell users to update to gguf in the case automatic pull fails Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>	2023-12-19 09:05:46 -08:00
Bruce MacDonald	d99fa6ce0a	send empty messages on last chat response (#1530 )	2023-12-18 14:23:38 -05:00
Patrick Devine	630518f0d9	Add unit test of API routes (#1528 )	2023-12-14 16:47:40 -08:00
Bruce MacDonald	6ee8c80199	restore model load duration on generate response (#1524 ) * restore model load duration on generate response - set model load duration on generate and chat done response - calculate createAt time when response created * remove checkpoints predict opts * Update routes.go	2023-12-14 12:15:50 -05:00
Patrick Devine	d9e60f634b	add image support to the chat api (#1490 )	2023-12-12 13:28:58 -08:00
Patrick Devine	910e9401d0	Multimodal support (#1216 ) --------- Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>	2023-12-11 13:56:22 -08:00
Jeffrey Morgan	7db5bcf73b	fix `go-staticcheck` warning	2023-12-10 11:44:27 -05:00
Jeffrey Morgan	fa2f095bd9	fix model name returned by `/api/generate` being different than the model name provided	2023-12-10 11:42:15 -05:00
Jeffrey Morgan	045b855db9	fix error on accumulating final chat response	2023-12-10 11:24:39 -05:00
Jeffrey Morgan	32064a0646	fix empty response when receiving runner error	2023-12-10 10:53:38 -05:00
Jeffrey Morgan	9e1406e4ed	Don't expose model information in `/api/generate`	2023-12-09 02:05:43 -08:00
Bruce MacDonald	7e9405fd07	fix: encode full previous prompt in context (#1424 )	2023-12-08 16:53:51 -05:00
Michael Yang	c3ff36088b	Merge pull request #774 from jmorganca/mxyng/server-version add version api and show server version in cli	2023-12-06 13:22:55 -08:00
Michael Yang	5d75505ebd	return model configuration in generate	2023-12-05 14:39:02 -08:00
Michael Yang	b9495ea162	load projectors	2023-12-05 14:36:12 -08:00
Michael Yang	d3479c07a1	Merge pull request #1250 from jmorganca/mxyng/create-layer refactor layer creation	2023-12-05 14:32:52 -08:00
Bruce MacDonald	195e3d9dbd	chat api endpoint (#1392 )	2023-12-05 14:57:33 -05:00
Michael Yang	1ebdbd9694	server: add version handler	2023-12-05 09:36:01 -08:00
Jeffrey Morgan	00d06619a1	Revert "chat api (#991 )" while context variable is fixed This reverts commit `7a0899d62d`.	2023-12-04 21:16:27 -08:00
Michael Yang	a3737cbd33	use NewLayer for CreateBlobHandler	2023-12-04 16:59:23 -08:00
Bruce MacDonald	7a0899d62d	chat api (#991 ) - update chat docs - add messages chat endpoint - remove deprecated context and template generate parameters from docs - context and template are still supported for the time being and will continue to work as expected - add partial response to chat history	2023-12-04 18:01:06 -05:00
Bruce MacDonald	96122b7271	validate model tags on copy (#1323 )	2023-11-29 15:54:29 -05:00
Timothy Jaeryang Baek	c2e3b89176	fix: disable ':' in tag names (#1280 ) Co-authored-by: rootedbox	2023-11-29 13:33:45 -05:00
Bruce MacDonald	37d95157df	fix relative path on create (#1222 )	2023-11-21 15:43:17 -05:00
Bruce MacDonald	43a726149d	fix potentially inaccurate error message	2023-11-18 21:25:07 -05:00
Jeffrey Morgan	bab9494176	add `-` separator to temp file created on `ollama create`	2023-11-18 09:39:52 -05:00
Michael Yang	c6e6c8ee7e	fix cross device rename	2023-11-17 15:22:17 -08:00
Michael Yang	54f92f01cb	update docs	2023-11-15 15:28:15 -08:00
Michael Yang	bc22d5a38b	no blob response	2023-11-15 15:16:23 -08:00
Michael Yang	1901044b07	use checksum reference	2023-11-15 15:16:23 -08:00
Michael Yang	1552cee59f	client create modelfile	2023-11-15 15:16:23 -08:00
Michael Yang	3ca56b5ada	add create modelfile field	2023-11-15 15:16:23 -08:00
Michael Yang	b0d14ed51c	refactor create model	2023-11-15 15:16:23 -08:00
Jeffrey Morgan	5cba29b9d6	JSON mode: add `"format" as an api parameter (#1051 ) * add `"format": "json"` as an API parameter --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2023-11-09 16:44:02 -08:00
Bruce MacDonald	ec2a31e9b3	support raw generation requests (#952 ) - add the optional `raw` generate request parameter to bypass prompt formatting and response context -add raw request to docs	2023-11-08 14:05:02 -08:00
Noah Gitsham	8ae8c9fa8c	Remove duplicate "install" in GPU support warning (#984 )	2023-11-03 00:45:14 -07:00
Noah Gitsham	f39daff461	Add missing "be" to GPU support warning message (#983 )	2023-11-02 18:37:12 -07:00
Michael Yang	2c6189f4fe	Merge pull request #750 from jmorganca/mxyng/concurrent-uploads concurrent uploads	2023-11-01 15:00:01 -07:00
Bruce MacDonald	f9a4281124	clean up: remove server functions from client (#937 )	2023-10-30 11:10:18 -04:00
Michael Yang	4e09aab8b9	concurrent uploads	2023-10-27 17:07:33 -07:00
Michael Yang	386169205c	update runtime options (#864 )	2023-10-20 21:17:14 -04:00
Jeffrey Morgan	7ed5a39bc7	simpler check for model loading compatibility errors	2023-10-19 14:50:49 -04:00
Michael Yang	e1c5be24e7	check json eof	2023-10-19 09:21:51 -07:00
Michael Yang	2ad8a074ac	generate: set created_at move the empty response so it's more visible	2023-10-19 09:21:51 -07:00
Michael Yang	7e547c6833	s/message/error/	2023-10-19 09:21:04 -07:00
Michael Yang	689842b9ff	request: bad request when model missing fields	2023-10-19 09:21:04 -07:00
Michael Yang	a19d47642e	models: rm workDir from CreateModel unused after removing EMBED	2023-10-19 09:21:04 -07:00
Bruce MacDonald	fe6f3b48f7	do not reload the running llm when runtime params change (#840 ) - only reload the running llm if the model has changed, or the options for loading the running model have changed - rename loaded llm to runner to differentiate from loaded model image - remove logic which keeps the first system prompt in the generation context	2023-10-19 10:39:58 -04:00

1 2 3 4 5 ...

261 commits