ollama

Author	SHA1	Message	Date
Michael Yang	8608eb4760	prune empty directories	2023-09-27 10:58:09 -07:00
Jeffrey Morgan	9b12a511ca	check other request fields before load short circuit in `/api/generate`	2023-09-22 23:50:55 -04:00
Bruce MacDonald	5d71bda478	close llm on interrupt (#577 )	2023-09-22 19:41:52 +01:00
Michael Yang	82f5b66c01	register HEAD /api/tags	2023-09-21 16:38:03 -07:00
Michael Yang	c986694367	fix HEAD / request HEAD request should respond like their GET counterparts except without a response body.	2023-09-21 16:35:58 -07:00
Bruce MacDonald	4cba75efc5	remove tmp directories created by previous servers (#559 ) * remove tmp directories created by previous servers * clean up on server stop * Update routes.go * Update server/routes.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * create top-level temp ollama dir * check file exists before creating --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-21 20:38:49 +01:00
Michael Yang	1fabba474b	refactor default allow origins this should be less error prone	2023-09-21 09:42:25 -07:00
Bruce MacDonald	1255bc9b45	only package 11.8 runner	2023-09-20 20:00:41 +01:00
Patrick Devine	80dd44e80a	Cmd changes (#541 )	2023-09-18 12:26:56 -07:00
Bruce MacDonald	f221637053	first pass at linux gpu support (#454 ) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-12 11:04:35 -04:00
Patrick Devine	e7e91cd71c	add autoprune to remove unused layers (#491 )	2023-09-11 11:46:35 -07:00
Patrick Devine	790d24eb7b	add show command (#474 )	2023-09-06 11:04:17 -07:00
Michael Yang	681f3c4c42	fix num_keep	2023-09-03 17:47:49 -04:00
Michael Yang	eeb40a672c	fix list models for windows	2023-08-31 09:47:10 -04:00
Michael Yang	0f541a0367	s/ListResponseModel/ModelResponse/	2023-08-31 09:47:10 -04:00
Bruce MacDonald	42998d797d	subprocess llama.cpp server (#401 ) * remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm	2023-08-30 16:35:03 -04:00
Patrick Devine	8bbff2df98	add model IDs (#439 )	2023-08-28 20:50:24 -07:00
Michael Yang	95187d7e1e	build release mode	2023-08-22 09:52:43 -07:00
Jeffrey Morgan	a9f6c56652	fix `FROM` instruction erroring when referring to a file	2023-08-22 09:39:42 -07:00
Ryan Baker	0a892419ad	Strip protocol from model path (#377 )	2023-08-21 21:56:56 -07:00
Bruce MacDonald	326de48930	use loaded llm for embeddings	2023-08-15 10:50:54 -03:00
Patrick Devine	d9cf18e28d	add maximum retries when pushing (#334 )	2023-08-11 15:41:55 -07:00
Michael Yang	6517bcc53c	Merge pull request #290 from jmorganca/add-adapter-layers implement loading ggml lora adapters through the modelfile	2023-08-10 17:23:01 -07:00
Michael Yang	6a6828bddf	Merge pull request #167 from jmorganca/decode-ggml partial decode ggml bin for more info	2023-08-10 17:22:40 -07:00
Jeffrey Morgan	040a5b9750	clean up cli flags	2023-08-10 09:27:03 -07:00
Michael Yang	6de5d032e1	implement loading ggml lora adapters through the modelfile	2023-08-10 09:23:39 -07:00
Michael Yang	fccf8d179f	partial decode ggml bin for more info	2023-08-10 09:23:10 -07:00
Bruce MacDonald	4b3507f036	embeddings endpoint Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>	2023-08-10 11:45:57 -04:00
Bruce MacDonald	868e3b31c7	allow for concurrent pulls of the same files	2023-08-09 11:31:54 -04:00
Bruce MacDonald	09d8bf6730	fix build errors	2023-08-09 10:45:57 -04:00
Bruce MacDonald	7a5f3616fd	embed text document in modelfile	2023-08-09 10:26:19 -04:00
Jeffrey Morgan	cff002b824	use content type `application/x-ndjson` for streaming responses	2023-08-08 21:38:10 -07:00
Jeffrey Morgan	a027a7dd65	add `0.0.0.0` as an allowed origin by default Fixes #282	2023-08-08 13:39:50 -07:00
Bruce MacDonald	21ddcaa1f1	pr comments - default to embeddings enabled - move embedding logic for loaded model to request - allow embedding full directory - close llm on reload	2023-08-08 13:49:37 -04:00
Michael Yang	f2074ed4c0	Merge pull request #306 from jmorganca/default-keep-system automatically set num_keep if num_keep < 0	2023-08-08 09:25:34 -07:00
Bruce MacDonald	a6f6d18f83	embed text document in modelfile	2023-08-08 11:27:17 -04:00
Michael Yang	4dc5b117dd	automatically set num_keep if num_keep < 0 num_keep defines how many tokens to keep in the context when truncating inputs. if left to its default value of -1, the server will calculate num_keep to be the left of the system instructions	2023-08-07 16:19:12 -07:00
cmiller01	fb593b7bfc	pass flags to `serve` to allow setting allowed-origins + host and port * resolves: https://github.com/jmorganca/ollama/issues/300 and https://github.com/jmorganca/ollama/issues/282 * example usage: ``` ollama serve --port 9999 --allowed-origins "http://foo.example.com,http://192.0.0.1" ```	2023-08-07 03:34:37 +00:00
Jeffrey Morgan	e3fb1fd3f1	server: compare options correctly	2023-08-03 15:55:40 -04:00
Bruce MacDonald	8b1e791820	allow specifying zero values in modelfile	2023-08-02 17:07:53 -04:00
Jeffrey Morgan	03cff3a225	server: reset digest at end of generate	2023-08-02 16:15:44 -04:00
Bruce MacDonald	8f8b6288ac	check server is running before running command	2023-08-02 10:51:23 -04:00
Bruce MacDonald	765994362c	use head to check heartbeat	2023-08-01 14:50:38 -04:00
Bruce MacDonald	1c5a8770ee	read runner parameter options from map - read runner options from map to see what was specified explicitly and overwrite zero values	2023-08-01 13:38:19 -04:00
Bruce MacDonald	daa0d1de7a	allow specifying zero values in modelfile	2023-08-01 13:37:50 -04:00
Jeffrey Morgan	528bafa585	cache loaded model	2023-08-01 11:24:18 -04:00
Bruce MacDonald	671eec6da9	log prediction failures	2023-07-31 16:46:37 -04:00
Michael Yang	f62a882760	add session expiration	2023-07-27 09:31:44 -07:00
Michael Yang	32aec66e6a	add load duration	2023-07-27 09:31:44 -07:00
Michael Yang	35af37a2cb	session id	2023-07-27 09:31:44 -07:00
Bruce MacDonald	4c1caa3733	download models when creating from modelfile	2023-07-25 14:25:13 -04:00
Patrick Devine	4cb42ca55e	add copy command (#191 )	2023-07-24 11:27:28 -04:00
Michael Yang	8609db77ea	use gin-contrib/cors middleware	2023-07-22 09:39:08 -07:00
Patrick Devine	6d6b0d3321	change error handler behavior and fix error when a model isn't found (#173 )	2023-07-21 23:02:12 -07:00
Patrick Devine	9f6e97865c	allow pushing/pulling to insecure registries (#157 )	2023-07-21 15:42:19 -07:00
Bruce MacDonald	7ba1308595	Merge pull request #147 from jmorganca/brucemacd/cli-err-display Improve CLI error display	2023-07-21 16:10:19 +02:00
Patrick Devine	e7a393de54	add rm command for models (#151 )	2023-07-20 16:09:23 -07:00
Michael Yang	1f27d7f1b8	fix stream errors	2023-07-20 12:12:08 -07:00
Bruce MacDonald	09dc6273e3	suppress error when running list before pulling image	2023-07-20 20:53:09 +02:00
Bruce MacDonald	3ec4ebc562	remove unused code	2023-07-20 20:18:00 +02:00
Michael Yang	df146c41e2	separate prompt into template and system	2023-07-19 23:24:31 -07:00
Jeffrey Morgan	2d305fa99a	allow relative paths in `FROM` instruction	2023-07-19 21:55:15 -07:00
Michael Yang	68df36ae50	fix pull 0 bytes on completed layer	2023-07-18 19:38:11 -07:00
Patrick Devine	9e15635c2d	attempt two for skipping files in the file walk (#105 )	2023-07-18 15:37:01 -07:00
Patrick Devine	9658a5043b	skip files in the list if we can't get the correct model path (#100 )	2023-07-18 12:39:08 -07:00
Patrick Devine	5bea29f610	add new list command (#97 )	2023-07-18 09:09:45 -07:00
Michael Yang	c7dd52271c	remove debugging messages	2023-07-17 14:17:34 -07:00
Michael Yang	28a136e9a3	modelfile params	2023-07-17 12:35:03 -07:00
Patrick Devine	2fb52261ad	basic distribution w/ push/pull (#78 ) * basic distribution w/ push/pull * add the parser * add create, pull, and push * changes to the parser, FROM line, and fix commands * mkdirp new manifest directories * make `blobs` directory if it does not exist * fix go warnings * add progressbar for model pulls * move model struct --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2023-07-16 17:02:22 -07:00
Michael Yang	743e957d88	use filepath for os compat	2023-07-14 17:27:14 -07:00
Michael Yang	5ade3db040	fix race block on write which only returns when the channel is closed. this is contrary to the previous arrangement where the handler may return but the stream hasn't finished writing. it can lead to the client receiving unexpected responses (since the request has been handled) or worst case a nil-pointer dereference as the stream tries to flush a nil writer	2023-07-14 15:10:46 -07:00
Michael Yang	1775647f76	continue conversation feed responses back into the llm	2023-07-13 17:13:00 -07:00
Michael Yang	05e08d2310	return more info in generate response	2023-07-13 09:37:32 -07:00
Michael Yang	31590284a7	fix route	2023-07-12 19:21:49 -07:00
Michael Yang	2666d3c206	fix pull race	2023-07-12 19:07:23 -07:00
Michael Yang	0944b01e7d	pull fixes	2023-07-12 09:55:07 -07:00
Michael Yang	a806b03f62	no errgroup	2023-07-11 14:58:10 -07:00
Michael Yang	e243329e2e	check api status	2023-07-11 13:42:05 -07:00
Michael Yang	2a66a1164a	common stream producer	2023-07-11 13:42:05 -07:00
Michael Yang	fd4792ec56	call llama.cpp directly from go	2023-07-11 11:59:18 -07:00
Jeffrey Morgan	a3ec1ec2a0	consistent error handling for pull and generate	2023-07-10 21:34:15 -07:00
Michael Yang	edba935d67	return error in generate response	2023-07-10 13:30:10 -07:00
Bruce MacDonald	f5e2e150b8	allow overriding default generate options	2023-07-10 20:58:02 +02:00
Jeffrey Morgan	74e92d1258	add basic `/` route for server	2023-07-07 23:46:15 -04:00
Bruce MacDonald	f533f85d44	pr feedback - move error check to api client pull - simplify error check in generate - return nil on any pull error	2023-07-07 17:12:02 -04:00
Bruce MacDonald	61dd87bd90	if directory cannot be resolved, do not fail	2023-07-07 15:27:43 -04:00
Patrick Devine	3f1b7177f2	pass model and predict options	2023-07-07 09:34:05 -07:00
Michael Yang	b0618a466e	generate progress	2023-07-06 17:07:40 -07:00
Michael Yang	15c114decb	fix prompt templates	2023-07-06 17:03:18 -07:00
Michael Yang	0637632258	simple pull response	2023-07-06 16:34:44 -04:00
Michael Yang	dd960d1d5e	update generate response	2023-07-06 16:34:44 -04:00
Michael Yang	9b8a456c7d	embed templates	2023-07-06 16:34:44 -04:00
Bruce MacDonald	7cf5905063	display pull progress	2023-07-06 16:34:44 -04:00
Michael Yang	580fe8951c	free llama model	2023-07-06 16:34:44 -04:00
Michael Yang	68e6b4550c	use prompt templates	2023-07-06 16:34:44 -04:00
Bruce MacDonald	a6494f8211	pull models	2023-07-06 16:34:44 -04:00
Michael Yang	1b7183c5a1	enable metal gpu acceleration ggml-metal.metal must be in the same directory as the ollama binary otherwise llama.cpp will not be able to find it and load it. 1. go generate llama/llama_metal.go 2. go build . 3. ./ollama serve	2023-07-06 16:34:44 -04:00
Jeffrey Morgan	0998d4f0a4	remove debug print statements	2023-07-06 16:34:44 -04:00
Bruce MacDonald	8ea5e5e147	separate routes	2023-07-06 16:34:44 -04:00
Jeffrey Morgan	fd962a36e5	client updates	2023-07-06 16:34:44 -04:00
Jeffrey Morgan	9164981d72	move prompt templates out of python bindings	2023-07-06 16:34:44 -04:00
Jeffrey Morgan	6093a88c1a	add llama.cpp go bindings	2023-07-06 16:34:44 -04:00
Jeffrey Morgan	76cb60d496	wip go engine Co-authored-by: Patrick Devine <pdevine@sonic.net>	2023-07-06 16:34:44 -04:00

... 3 4 5 6 7

303 commits