ollama

Author	SHA1	Message	Date
Daniel Hiltgen	bee2f4a3b0	Record GPU usage information This records more GPU usage information for eventual UX inclusion.	2024-05-08 14:45:39 -07:00
Daniel Hiltgen	34b9db5afc	Request and model concurrency This change adds support for multiple concurrent requests, as well as loading multiple models by spawning multiple runners. The default settings are currently set at 1 concurrent request per model and only 1 loaded model at a time, but these can be adjusted by setting OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.	2024-04-22 19:29:12 -07:00
Michael Yang	7e33a017c0	partial offloading	2024-04-10 11:37:20 -07:00
Michael Yang	91b3e4d282	update memory calcualtions count each layer independently when deciding gpu offloading	2024-04-01 13:16:32 -07:00
Michael Yang	424d53ac70	progress: fix bar rate	2023-11-28 11:44:56 -08:00
Jeffrey Morgan	93a108214c	only show decimal points for smaller file size numbers	2023-11-20 10:58:19 -05:00
Michael Yang	9f04e5a8ea	format bytes	2023-11-17 10:06:19 -08:00
Michael Yang	01ea6002c4	replace go-humanize with format.HumanBytes	2023-11-14 14:57:41 -08:00
Michael Yang	92189a5855	fix memory check	2023-10-13 14:47:29 -07:00
Michael Yang	b599946b74	add format bytes	2023-10-11 14:08:23 -07:00