Jeffrey Morgan
|
1ffb1e2874
|
update llama.cpp submodule to 77d1ac7 (#3030)
|
2024-03-09 15:55:34 -08:00 |
|
Jeffrey Morgan
|
0e4669b04f
|
update llama.cpp submodule to 6cdabe6 (#2999)
|
2024-03-08 00:26:20 -08:00 |
|
Jeffrey Morgan
|
21347e1ed6
|
update llama.cpp submodule to c29af7e (#2868)
|
2024-03-01 15:26:04 -08:00 |
|
Jeffrey Morgan
|
4613a080e7
|
update llama.cpp submodule to 66c1968f7 (#2618)
|
2024-02-20 17:42:31 -05:00 |
|
Daniel Hiltgen
|
fc39a6cd7a
|
Fix cuda leaks
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
|
2024-02-18 18:37:20 -08:00 |
|
Jeffrey Morgan
|
26b13fc33c
|
patch: always add token to cache_tokens (#2459)
|
2024-02-12 08:10:16 -08:00 |
|
Daniel Hiltgen
|
de76b95dd4
|
Bump llama.cpp to b2081
|
2024-02-06 12:06:43 -08:00 |
|
Daniel Hiltgen
|
72b12c3be7
|
Bump llama.cpp to b1999
This requires an upstream change to support graceful termination,
carried as a patch.
|
2024-01-30 16:52:12 -08:00 |
|
Jeffrey Morgan
|
a64570dcae
|
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt
* fix powershell script
|
2024-01-25 13:46:20 -08:00 |
|