ollama

Author	SHA1	Message	Date
Mélony QIN	3f71ba406a	Correct the kubernetes terminology (#3843 ) * add details on kubernetes deployment and separate the testing process * Update examples/kubernetes/README.md thanks for suggesting this change, I agree with you and let's make this project better together ! Co-authored-by: JonZeolla <Zeolla@gmail.com> --------- Co-authored-by: QIN Mélony <MQN1@dsone.3ds.com> Co-authored-by: JonZeolla <Zeolla@gmail.com>	2024-05-07 09:53:08 -07:00
Hause Lin	88a67127d8	Update README.md to include ollama-r library (#4012 ) * Update README.md Add Ollama for R - ollama-r library * Update README.md --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-05-07 09:52:30 -07:00
Jeffrey Morgan	f7dc7dcc64	Update .gitattributes	2024-05-07 09:50:19 -07:00
alwqx	04f971c84b	fix golangci workflow missing gofmt and goimports (#4190 )	2024-05-07 09:49:40 -07:00
Michael Yang	548a7df014	update list handler to use model.Name	2024-05-07 09:38:45 -07:00
Michael Yang	70edb9bc4d	Merge pull request #4215 from ollama/mxyng/mem llm: add minimum based on layer size	2024-05-07 09:26:33 -07:00
Michael Yang	3f0ed03856	Update examples/flyio/README.md Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-05-07 09:25:01 -07:00
Michael Yang	4736391bfb	llm: add minimum based on layer size	2024-05-06 17:04:19 -07:00
CrispStrobe	7c5330413b	note on naming restrictions (#2625 ) * note on naming restrictions else push would fail with cryptic retrieving manifest Error: file does not exist ==> maybe change that in code too * Update docs/import.md --------- Co-authored-by: C-4-5-3 <154636388+C-4-5-3@users.noreply.github.com> Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-05-06 16:03:21 -07:00
Jeffrey Morgan	39d9d22ca3	close server on receiving signal (#4213 )	2024-05-06 16:01:37 -07:00
Jackie Li	af47413dba	Add MarshalJSON to Duration (#3284 ) --------- Co-authored-by: Patrick Devine <patrick@infrahq.com>	2024-05-06 15:59:18 -07:00
Michael Yang	b2f00aa977	close zip files	2024-05-06 15:27:19 -07:00
Michael Yang	6694be5e50	convert/llama: use WriteSeeker	2024-05-06 15:24:01 -07:00
Michael Yang	f5e8b207fb	s/DisplayLongest/String/	2024-05-06 15:24:01 -07:00
Michael Yang	d245460362	only quantize language models	2024-05-06 15:24:01 -07:00
Michael Yang	4d0d0fa383	no iterator	2024-05-06 15:24:01 -07:00
Michael Yang	7ffe45734d	rebase	2024-05-06 15:24:01 -07:00
Michael Yang	01811c176a	comments	2024-05-06 15:24:01 -07:00
Michael Yang	a7248f6ea8	update tests	2024-05-06 15:24:01 -07:00
Michael Yang	9685c34509	quantize any fp16/fp32 model - FROM /path/to/{safetensors,pytorch} - FROM /path/to/fp{16,32}.bin - FROM model:fp{16,32}	2024-05-06 15:24:01 -07:00
Jeffrey Chen	d091fe3c21	Windows automatically recognizes username (#3214 )	2024-05-06 15:03:14 -07:00
Mohamed A. Fouad	ee02f548c8	Update linux.md (#3847 ) Add -e to viewing logs in order to show end of ollama logs	2024-05-06 15:02:25 -07:00
Daniel Hiltgen	b08870aff3	Merge pull request #4188 from dhiltgen/use_our_lib User our bundled libraries (cuda) instead of the host library	2024-05-06 14:41:05 -07:00
Darinka	3ecae420ac	Update api.md (#3945 ) * Update api.md Changed the calculation of tps (token/s) in the documentation * Update docs/api.md --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-05-06 14:39:58 -07:00
Daniel Hiltgen	4cbbf0e13b	Merge pull request #4090 from dhiltgen/rocm_paths Support Fedoras standard ROCm location	2024-05-06 14:33:41 -07:00
Daniel Hiltgen	380378cc80	Use our libraries first Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly	2024-05-06 14:23:29 -07:00
Daniel Hiltgen	0963c65027	Merge pull request #4208 from dhiltgen/fix_sched_test Fix stale test logic	2024-05-06 14:23:12 -07:00
Jeffrey Morgan	ed740a2504	Fix `no slots available` error with concurrent requests (#4160 )	2024-05-06 14:22:53 -07:00
Jeffrey Morgan	c9f98622b1	Skip scheduling cancelled requests, always reload unloaded runners (#4189 )	2024-05-06 14:22:24 -07:00
Daniel Hiltgen	0a954e5066	Fix stale test logic The model processing was recently changed to be deferred but this test scenario hadn't been adjusted for that change in behavior.	2024-05-06 14:15:37 -07:00
Adrien Brault	aa93423fbf	docs: pbcopy on mac (#3129 )	2024-05-06 13:47:00 -07:00
Nurgo	01c9386267	Add BrainSoup to compatible clients list (#3473 )	2024-05-06 13:42:16 -07:00
Daniel Hiltgen	af9eb36f9f	Merge pull request #4135 from dhiltgen/no_physx Skip PhysX cudart library	2024-05-06 13:34:00 -07:00
Daniel Hiltgen	06093fd396	Merge pull request #4067 from dhiltgen/cudart Add CUDA Driver API for GPU discovery	2024-05-06 13:30:27 -07:00
Tony Loehr	86b7fcac32	Update README.md with StreamDeploy (#3621 ) Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2024-05-06 11:14:41 -07:00
Hyden Liu	fb8ddc564e	chore: delete `HEAD` (#4194 )	2024-05-06 10:32:30 -07:00
Saif	242efe6611	👌 IMPROVE: add portkey library for production tools (#4119 )	2024-05-06 10:25:23 -07:00
Jeffrey Morgan	1b0e6c9c0e	Fix llava models not working after first request (#4164 ) * fix llava models not working after first request * individual requests only for llava models	2024-05-05 20:50:31 -07:00
Jeffrey Morgan	dfa2f32ca0	unload in critical section (#4187 )	2024-05-05 17:18:27 -07:00
Daniel Hiltgen	840424a2c4	Merge pull request #4154 from dhiltgen/central_config Centralize server config handling	2024-05-05 17:08:26 -07:00
Daniel Hiltgen	f56aa20014	Centralize server config handling This moves all the env var reading into one central module and logs the loaded config once at startup which should help in troubleshooting user server logs	2024-05-05 16:49:50 -07:00
alwqx	6707768ebd	chore: format go code (#4149 )	2024-05-05 16:08:09 -07:00
Lord Basil - Automate EVERYTHING	c78bb76a12	update libraries for langchain_community + llama3 changed from llama2 (#4174 )	2024-05-05 16:07:04 -07:00
Jeffrey Morgan	942c979232	allocate a large enough kv cache for all parallel requests (#4162 )	2024-05-05 15:59:32 -07:00
Bernardo de Oliveira Bruning	06164911dd	Update README.md (#4111 ) --------- Co-authored-by: Patrick Devine <patrick@infrahq.com>	2024-05-05 14:45:32 -07:00
Patrick Devine	2a21363bb7	validate the format of the digest when getting the model path (#4175 )	2024-05-05 11:46:12 -07:00
Daniel Hiltgen	026869915f	Merge pull request #4144 from dhiltgen/max_queue Make maximum pending request configurable	2024-05-05 10:53:44 -07:00
Daniel Hiltgen	45d61aaaa3	Add integration test to push max queue limits	2024-05-05 10:46:25 -07:00
Daniel Hiltgen	20f6c06569	Make maximum pending request configurable This also bumps up the default to be 50 queued requests instead of 10.	2024-05-04 21:00:52 -07:00
Daniel Hiltgen	371f5e52aa	Merge pull request #4141 from dhiltgen/win_docs Explain the 2 different windows download options	2024-05-04 12:50:16 -07:00

... 3 4 5 6 7 ...

2828 commits