Daniel Hiltgen
34b9db5afc
Request and model concurrency
...
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Michael Yang
7e33a017c0
partial offloading
2024-04-10 11:37:20 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
fd10a2ad4b
remove format/openssh.go
...
this is unnecessary now that x/crypto/ssh.MarshalPrivateKey has been
added
2024-02-23 16:52:23 -08:00
Michael Yang
424d53ac70
progress: fix bar rate
2023-11-28 11:44:56 -08:00
Jeffrey Morgan
93a108214c
only show decimal points for smaller file size numbers
2023-11-20 10:58:19 -05:00
Michael Yang
9f04e5a8ea
format bytes
2023-11-17 10:06:19 -08:00
Michael Yang
01ea6002c4
replace go-humanize with format.HumanBytes
2023-11-14 14:57:41 -08:00
Michael Yang
c5e1bbabda
instead of static number of parameters for each model family, get the real number from the tensors ( #1022 )
...
* parse tensor info
* refactor decoder
* return actual parameter count
* explicit rounding
* s/Human/HumanNumber/
2023-11-08 17:55:46 -08:00
Michael Yang
2ce1793a1d
go fmt
2023-10-19 09:21:51 -07:00
Michael Yang
92189a5855
fix memory check
2023-10-13 14:47:29 -07:00
Michael Yang
b599946b74
add format bytes
2023-10-11 14:08:23 -07:00
Michael Yang
b5e08e3373
cleanup format time
2023-10-11 11:09:27 -07:00
Michael Yang
0dae34b6a7
remove unused openssh key types
2023-09-06 14:34:09 -07:00
Patrick Devine
9770e3b325
Generate private/public keypair for use w/ auth ( #324 )
2023-08-11 10:58:23 -07:00
Patrick Devine
5bea29f610
add new list command ( #97 )
2023-07-18 09:09:45 -07:00