Jeffrey Morgan
a7dad24d92
add error for falcon
and starcoder
vocab compatibility ( #844 )
...
add error for falcon and starcoder vocab compatibility
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-10-19 12:18:31 -04:00
Michael Yang
36fe2deebf
only check system memory on macos
2023-10-13 14:47:29 -07:00
Michael Yang
4a8931f634
check total (system + video) memory
2023-10-13 14:47:29 -07:00
Michael Yang
bd6e38fb1a
refactor memory check
2023-10-13 14:47:29 -07:00
Michael Yang
92189a5855
fix memory check
2023-10-13 14:47:29 -07:00
Michael Yang
b599946b74
add format bytes
2023-10-11 14:08:23 -07:00
Bruce MacDonald
d06bc0cb6e
enable q8, q5, 5_1, and f32 for linux gpu ( #699 )
2023-10-05 12:53:47 -04:00
Bruce MacDonald
86279f4ae3
unbound max num gpu layers ( #591 )
...
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-25 18:36:46 -04:00
Bruce MacDonald
4cba75efc5
remove tmp directories created by previous servers ( #559 )
...
* remove tmp directories created by previous servers
* clean up on server stop
* Update routes.go
* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* create top-level temp ollama dir
* check file exists before creating
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Michael Yang
7dee25a07f
fix falcon decode
...
get model and file type from bin file
2023-09-12 12:34:53 -07:00
Bruce MacDonald
09dd2aeff9
GGUF support ( #441 )
2023-09-07 13:55:37 -04:00
Bruce MacDonald
42998d797d
subprocess llama.cpp server ( #401 )
...
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Michael Yang
b25dd1795d
allow F16 to use metal
...
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang
304f2b6c96
add 34b to mem check
2023-08-26 08:29:21 -07:00
Michael Yang
a894cc792d
model and file type as strings
2023-08-17 12:08:04 -07:00
Michael Yang
e26085b921
close open files
2023-08-14 16:08:06 -07:00
Michael Yang
6de5d032e1
implement loading ggml lora adapters through the modelfile
2023-08-10 09:23:39 -07:00
Michael Yang
d791df75dd
check memory requirements before loading
2023-08-10 09:23:11 -07:00
Michael Yang
020a3b3530
disable gpu for q5_0, q5_1, q8_0 quants
2023-08-10 09:23:11 -07:00
Michael Yang
fccf8d179f
partial decode ggml bin for more info
2023-08-10 09:23:10 -07:00