42998d797d
* remove c code * pack llama.cpp * use request context for llama_cpp * let llama_cpp decide the number of threads to use * stop llama runner when app stops * remove sample count and duration metrics * use go generate to get libraries * tmp dir for running llm
29 lines
318 B
Markdown
29 lines
318 B
Markdown
# Development
|
|
|
|
- Install cmake or (optionally, required tools for GPUs)
|
|
- run `go generate ./...`
|
|
- run `go build .`
|
|
|
|
Install required tools:
|
|
|
|
```
|
|
brew install go cmake gcc
|
|
```
|
|
|
|
Get the required libraries:
|
|
|
|
```
|
|
go generate ./...
|
|
```
|
|
|
|
Then build ollama:
|
|
|
|
```
|
|
go build .
|
|
```
|
|
|
|
Now you can run `ollama`:
|
|
|
|
```
|
|
./ollama
|
|
```
|