Daniel Hiltgen
|
fc39a6cd7a
|
Fix cuda leaks
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
|
2024-02-18 18:37:20 -08:00 |
|
Jeffrey Morgan
|
26b13fc33c
|
patch: always add token to cache_tokens (#2459)
|
2024-02-12 08:10:16 -08:00 |
|
Daniel Hiltgen
|
de76b95dd4
|
Bump llama.cpp to b2081
|
2024-02-06 12:06:43 -08:00 |
|
Daniel Hiltgen
|
72b12c3be7
|
Bump llama.cpp to b1999
This requires an upstream change to support graceful termination,
carried as a patch.
|
2024-01-30 16:52:12 -08:00 |
|
Jeffrey Morgan
|
a64570dcae
|
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt
* fix powershell script
|
2024-01-25 13:46:20 -08:00 |
|