ollama/llama/runner
Jesse Gross 312d9de1d1 llama: Improve error handling
Check for NULL return values from llama.cpp in more places and
convert them into Go errors, which should make debugging easier
in the future rather than having hidden surprises in our data
structures.
2024-11-02 13:37:55 -07:00
..
cache.go runner.go: Better abstract vision model integration 2024-10-30 14:53:43 -07:00
cache_test.go runner.go: Better abstract vision model integration 2024-10-30 14:53:43 -07:00
image.go llama: Improve error handling 2024-11-02 13:37:55 -07:00
image_test.go runner.go: Better abstract vision model integration 2024-10-30 14:53:43 -07:00
README.md Re-introduce the llama package (#5034) 2024-10-08 08:53:54 -07:00
requirements.go Re-introduce the llama package (#5034) 2024-10-08 08:53:54 -07:00
runner.go llama: Improve error handling 2024-11-02 13:37:55 -07:00
stop.go runner.go: Handle truncation of tokens for stop sequences 2024-10-09 20:39:04 -07:00
stop_test.go runner.go: Handle truncation of tokens for stop sequences 2024-10-09 20:39:04 -07:00

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings