* remove tmp directories created by previous servers
* clean up on server stop
* Update routes.go
* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* create top-level temp ollama dir
* check file exists before creating
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm