42998d797d
* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
318 B
318 B
Development
- Install cmake or (optionally, required tools for GPUs)
- run
go generate ./...
- run
go build .
Install required tools:
brew install go cmake gcc
Get the required libraries:
go generate ./...
Then build ollama:
go build .
Now you can run ollama
:
./ollama