- remove ggml runner
- automatically pull gguf models when ggml detected
- tell users to update to gguf in the case automatic pull fails
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
* remove tmp directories created by previous servers
* clean up on server stop
* Update routes.go
* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* create top-level temp ollama dir
* check file exists before creating
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm