ollama

Author	SHA1	Message	Date
Daniel Hiltgen	cd5c8f6471	Optimize container images for startup (#6547 ) * Optimize container images for startup This change adjusts how to handle runner payloads to support container builds where we keep them extracted in the filesystem. This makes it easier to optimize the cpu/cuda vs cpu/rocm images for size, and should result in faster startup times for container images. * Refactor payload logic and add buildx support for faster builds * Move payloads around * Review comments * Converge to buildx based helper scripts * Use docker buildx action for release	2024-09-12 12:10:30 -07:00
Mark Ward	34a4a94f13	ignore debug bin files	2024-05-01 18:51:10 +00:00
Daniel Hiltgen	58d95cc9bd	Switch back to subprocessing for llama.cpp This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.	2024-04-01 16:48:18 -07:00
Daniel Hiltgen	29e90cc13b	Implement new Go based Desktop app This focuses on Windows first, but coudl be used for Mac and possibly linux in the future.	2024-02-15 05:56:45 +00:00
Daniel Hiltgen	d4cd695759	Add cgo implementation for llama.cpp Run the server.cpp directly inside the Go runtime via cgo while retaining the LLM Go abstractions.	2023-12-19 09:05:46 -08:00
Jason Jacobs	3d620f9462	ignore jetbrain ides (#1287 )	2023-11-27 15:57:45 -05:00
Jing Zhang	82b9b329ff	windows CUDA support (#1262 ) * Support cuda build in Windows * Enable dynamic NumGPU allocation for Windows	2023-11-24 17:16:36 -05:00
Jeffrey Morgan	85e4441c6a	cache docker builds	2023-11-18 08:51:38 -05:00
Jeffrey Morgan	a82eb275ff	update docs for subprocess	2023-08-30 17:54:02 -04:00
Bruce MacDonald	42998d797d	subprocess llama.cpp server (#401 ) * remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm	2023-08-30 16:35:03 -04:00
Jeffrey Morgan	67b6f8ba86	add `ggml-metal.metal` to `.gitignore`	2023-07-28 11:04:21 -04:00
jk1jk	e6c427ce4d	Update .gitignore	2023-07-22 17:00:52 +03:00
Jeffrey Morgan	7c71c10d4f	fix compilation issue in Dockerfile, remove from `README.md` until ready	2023-07-11 19:51:08 -07:00
Michael Yang	442dec1c6f	vendor llama.cpp	2023-07-11 11:59:18 -07:00
Michael Yang	fd4792ec56	call llama.cpp directly from go	2023-07-11 11:59:18 -07:00
Jeffrey Morgan	9fe018675f	use `Makefile` for dependency building instead of `go generate`	2023-07-06 16:34:44 -04:00
Jeffrey Morgan	b0e986fb96	add binary to .gitignore	2023-07-06 16:34:44 -04:00
Bruce MacDonald	d34985b9df	add templates to prompt command	2023-06-26 13:41:16 -04:00
Jeffrey Morgan	b361fa72ec	reorganize directories	2023-06-25 13:08:03 -04:00
Jeffrey Morgan	d3709f85b5	build server into desktop app	2023-06-25 00:30:02 -04:00
Bruce MacDonald	c5bafaff54	package server with client	2023-06-23 18:38:22 -04:00
Bruce MacDonald	f0eee3faa0	build server executable	2023-06-23 17:23:30 -04:00
Bruce MacDonald	db81d81b23	Update .gitignore	2023-06-23 13:57:03 -04:00
Jeffrey Morgan	8fa91332fa	initial commit	2023-06-22 18:31:40 -04:00

24 commits