Daniel Hiltgen
|
43799532c1
|
Bump llama.cpp to b2474
The release just before ggml-cuda.cu refactoring
|
2024-03-23 09:54:56 +01:00 |
|
Michael Yang
|
291c663865
|
fix: clip memory leak
|
2024-03-14 13:12:42 -07:00 |
|
Jeffrey Morgan
|
e72c567cfd
|
restore locale patch (#3091)
|
2024-03-12 22:08:13 -07:00 |
|
Bruce MacDonald
|
b80661e8c7
|
relay load model errors to the client (#3065)
|
2024-03-11 16:48:27 -04:00 |
|
Jeffrey Morgan
|
369eda65f5
|
update llama.cpp submodule to ceca1ae (#3064)
|
2024-03-11 12:57:48 -07:00 |
|
Jeffrey Morgan
|
41b00b9856
|
fix 03-locale.diff
|
2024-03-10 16:21:05 -07:00 |
|
Jeffrey Morgan
|
908005d90b
|
patch: use default locale in wpm tokenizer (#3034)
|
2024-03-09 21:12:12 -08:00 |
|
Jeffrey Morgan
|
1ffb1e2874
|
update llama.cpp submodule to 77d1ac7 (#3030)
|
2024-03-09 15:55:34 -08:00 |
|
Jeffrey Morgan
|
0e4669b04f
|
update llama.cpp submodule to 6cdabe6 (#2999)
|
2024-03-08 00:26:20 -08:00 |
|
Jeffrey Morgan
|
21347e1ed6
|
update llama.cpp submodule to c29af7e (#2868)
|
2024-03-01 15:26:04 -08:00 |
|
Jeffrey Morgan
|
4613a080e7
|
update llama.cpp submodule to 66c1968f7 (#2618)
|
2024-02-20 17:42:31 -05:00 |
|
Daniel Hiltgen
|
fc39a6cd7a
|
Fix cuda leaks
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
|
2024-02-18 18:37:20 -08:00 |
|
Jeffrey Morgan
|
26b13fc33c
|
patch: always add token to cache_tokens (#2459)
|
2024-02-12 08:10:16 -08:00 |
|
Daniel Hiltgen
|
de76b95dd4
|
Bump llama.cpp to b2081
|
2024-02-06 12:06:43 -08:00 |
|
Daniel Hiltgen
|
72b12c3be7
|
Bump llama.cpp to b1999
This requires an upstream change to support graceful termination,
carried as a patch.
|
2024-01-30 16:52:12 -08:00 |
|
Jeffrey Morgan
|
a64570dcae
|
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt
* fix powershell script
|
2024-01-25 13:46:20 -08:00 |
|