Commit graph

24 commits

Author SHA1 Message Date
Daniel Hiltgen
cd5c8f6471
Optimize container images for startup (#6547)
* Optimize container images for startup

This change adjusts how to handle runner payloads to support
container builds where we keep them extracted in the filesystem.
This makes it easier to optimize the cpu/cuda vs cpu/rocm images for
size, and should result in faster startup times for container images.

* Refactor payload logic and add buildx support for faster builds

* Move payloads around

* Review comments

* Converge to buildx based helper scripts

* Use docker buildx action for release
2024-09-12 12:10:30 -07:00
Mark Ward
34a4a94f13 ignore debug bin files 2024-05-01 18:51:10 +00:00
Daniel Hiltgen
58d95cc9bd Switch back to subprocessing for llama.cpp
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems.  This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
29e90cc13b Implement new Go based Desktop app
This focuses on Windows first, but coudl be used for Mac
and possibly linux in the future.
2024-02-15 05:56:45 +00:00
Daniel Hiltgen
d4cd695759 Add cgo implementation for llama.cpp
Run the server.cpp directly inside the Go runtime via cgo
while retaining the LLM Go abstractions.
2023-12-19 09:05:46 -08:00
Jason Jacobs
3d620f9462
ignore jetbrain ides (#1287) 2023-11-27 15:57:45 -05:00
Jing Zhang
82b9b329ff
windows CUDA support (#1262)
* Support cuda build in Windows
* Enable dynamic NumGPU allocation for Windows
2023-11-24 17:16:36 -05:00
Jeffrey Morgan
85e4441c6a cache docker builds 2023-11-18 08:51:38 -05:00
Jeffrey Morgan
a82eb275ff update docs for subprocess 2023-08-30 17:54:02 -04:00
Bruce MacDonald
42998d797d
subprocess llama.cpp server (#401)
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Jeffrey Morgan
67b6f8ba86 add ggml-metal.metal to .gitignore 2023-07-28 11:04:21 -04:00
jk1jk
e6c427ce4d
Update .gitignore 2023-07-22 17:00:52 +03:00
Jeffrey Morgan
7c71c10d4f fix compilation issue in Dockerfile, remove from README.md until ready 2023-07-11 19:51:08 -07:00
Michael Yang
442dec1c6f vendor llama.cpp 2023-07-11 11:59:18 -07:00
Michael Yang
fd4792ec56 call llama.cpp directly from go 2023-07-11 11:59:18 -07:00
Jeffrey Morgan
9fe018675f use Makefile for dependency building instead of go generate 2023-07-06 16:34:44 -04:00
Jeffrey Morgan
b0e986fb96 add binary to .gitignore 2023-07-06 16:34:44 -04:00
Bruce MacDonald
d34985b9df add templates to prompt command 2023-06-26 13:41:16 -04:00
Jeffrey Morgan
b361fa72ec reorganize directories 2023-06-25 13:08:03 -04:00
Jeffrey Morgan
d3709f85b5 build server into desktop app 2023-06-25 00:30:02 -04:00
Bruce MacDonald
c5bafaff54 package server with client 2023-06-23 18:38:22 -04:00
Bruce MacDonald
f0eee3faa0 build server executable 2023-06-23 17:23:30 -04:00
Bruce MacDonald
db81d81b23 Update .gitignore 2023-06-23 13:57:03 -04:00
Jeffrey Morgan
8fa91332fa initial commit 2023-06-22 18:31:40 -04:00