Commit graph

7 commits

Author SHA1 Message Date
Daniel Hiltgen
0a0e9f3e0f Apply 01-cache.diff 2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Jeffrey Morgan
f5ca7f8c8e
add license in file header for vendored llama.cpp code (#3351) 2024-03-26 16:23:23 -04:00
Daniel Hiltgen
43799532c1 Bump llama.cpp to b2474
The release just before ggml-cuda.cu refactoring
2024-03-23 09:54:56 +01:00
Jeffrey Morgan
e95ffc7448
llama: remove server static assets (#3174) 2024-03-15 19:24:12 -07:00
Daniel Hiltgen
85129d3a32 Adapt our build for imported server.cpp 2024-03-12 14:57:15 -07:00
Daniel Hiltgen
9ac6440da3 Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00