ollama

Author	SHA1	Message	Date
Bruce MacDonald	09dd2aeff9	GGUF support (#441 )	2023-09-07 13:55:37 -04:00
Bruce MacDonald	42998d797d	subprocess llama.cpp server (#401 ) * remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm	2023-08-30 16:35:03 -04:00
Michael Yang	b25dd1795d	allow F16 to use metal warning F16 uses significantly more memory than quantized model so the standard requires don't apply.	2023-08-26 08:38:48 -07:00
Michael Yang	304f2b6c96	add 34b to mem check	2023-08-26 08:29:21 -07:00
Michael Yang	a894cc792d	model and file type as strings	2023-08-17 12:08:04 -07:00
Michael Yang	e26085b921	close open files	2023-08-14 16:08:06 -07:00
Michael Yang	6de5d032e1	implement loading ggml lora adapters through the modelfile	2023-08-10 09:23:39 -07:00
Michael Yang	d791df75dd	check memory requirements before loading	2023-08-10 09:23:11 -07:00
Michael Yang	020a3b3530	disable gpu for q5_0, q5_1, q8_0 quants	2023-08-10 09:23:11 -07:00
Michael Yang	fccf8d179f	partial decode ggml bin for more info	2023-08-10 09:23:10 -07:00

1 2