ollama

Author	SHA1	Message	Date
Daniel Hiltgen	34b9db5afc	Request and model concurrency This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.	2024-04-22 19:29:12 -07:00
Michael Yang	7e33a017c0	partial offloading	2024-04-10 11:37:20 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	fd10a2ad4b	remove format/openssh.go this is unnecessary now that x/crypto/ssh.MarshalPrivateKey has been added	2024-02-23 16:52:23 -08:00
Michael Yang	424d53ac70	progress: fix bar rate	2023-11-28 11:44:56 -08:00
Jeffrey Morgan	93a108214c	only show decimal points for smaller file size numbers	2023-11-20 10:58:19 -05:00
Michael Yang	9f04e5a8ea	format bytes	2023-11-17 10:06:19 -08:00
Michael Yang	01ea6002c4	replace go-humanize with format.HumanBytes	2023-11-14 14:57:41 -08:00
Michael Yang	c5e1bbabda	instead of static number of parameters for each model family, get the real number from the tensors (#1022 ) * parse tensor info * refactor decoder * return actual parameter count * explicit rounding * s/Human/HumanNumber/	2023-11-08 17:55:46 -08:00
Michael Yang	2ce1793a1d	go fmt	2023-10-19 09:21:51 -07:00
Michael Yang	92189a5855	fix memory check	2023-10-13 14:47:29 -07:00
Michael Yang	b599946b74	add format bytes	2023-10-11 14:08:23 -07:00
Michael Yang	b5e08e3373	cleanup format time	2023-10-11 11:09:27 -07:00
Michael Yang	0dae34b6a7	remove unused openssh key types	2023-09-06 14:34:09 -07:00
Patrick Devine	9770e3b325	Generate private/public keypair for use w/ auth (#324 )	2023-08-11 10:58:23 -07:00
Patrick Devine	5bea29f610	add new list command (#97 )	2023-07-18 09:09:45 -07:00

16 commits