Commit graph

418 commits

Author SHA1 Message Date
Michael Yang
b25dd1795d allow F16 to use metal
warning F16 uses significantly more memory than quantized model so the
standard requires don't apply.
2023-08-26 08:38:48 -07:00
Michael Yang
304f2b6c96 add 34b to mem check 2023-08-26 08:29:21 -07:00
Jeffrey Morgan
177b69a211 add missing entries for 34B 2023-08-25 18:35:35 -07:00
Michael Yang
7a378f8b66 patch llama.cpp for 34B 2023-08-25 10:06:55 -07:00
Michael Yang
b1cececb8e add 34b model type 2023-08-24 10:35:44 -07:00
Michael Yang
5ca05c2e88 fix ModelType() 2023-08-18 11:23:38 -07:00
Michael Yang
a894cc792d model and file type as strings 2023-08-17 12:08:04 -07:00
Michael Yang
4dcf5c3e0b
Merge pull request #349 from jmorganca/close-files
close open files
2023-08-14 16:15:58 -07:00
Michael Yang
e26085b921 close open files 2023-08-14 16:08:06 -07:00
Michael Yang
f7b613332c update llama.cpp 2023-08-14 15:47:00 -07:00
Bruce MacDonald
4b2d366c37 Update llama.go 2023-08-14 12:55:50 -03:00
Bruce MacDonald
56fd4e4ef2 log embedding eval timing 2023-08-14 12:51:31 -03:00
Jeffrey Morgan
22885aeaee update llama.cpp to f64d44a 2023-08-12 22:47:15 -04:00
Michael Yang
6ed991c8e2 ggml: fix off by one error
remove used Unknown FileType
2023-08-11 10:45:22 -07:00
Michael Yang
6de5d032e1 implement loading ggml lora adapters through the modelfile 2023-08-10 09:23:39 -07:00
Michael Yang
d791df75dd check memory requirements before loading 2023-08-10 09:23:11 -07:00
Michael Yang
020a3b3530 disable gpu for q5_0, q5_1, q8_0 quants 2023-08-10 09:23:11 -07:00
Michael Yang
fccf8d179f partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00