History

Jesse Gross 312d9de1d1 llama: Improve error handling Check for NULL return values from llama.cpp in more places and convert them into Go errors, which should make debugging easier in the future rather than having hidden surprises in our data structures.		2024-11-02 13:37:55 -07:00
..
cache.go	runner.go: Better abstract vision model integration	2024-10-30 14:53:43 -07:00
cache_test.go	runner.go: Better abstract vision model integration	2024-10-30 14:53:43 -07:00
image.go	llama: Improve error handling	2024-11-02 13:37:55 -07:00
image_test.go	runner.go: Better abstract vision model integration	2024-10-30 14:53:43 -07:00
README.md	Re-introduce the `llama` package (#5034 )	2024-10-08 08:53:54 -07:00
requirements.go	Re-introduce the `llama` package (#5034 )	2024-10-08 08:53:54 -07:00
runner.go	llama: Improve error handling	2024-11-02 13:37:55 -07:00
stop.go	runner.go: Handle truncation of tokens for stop sequences	2024-10-09 20:39:04 -07:00
stop_test.go	runner.go: Handle truncation of tokens for stop sequences	2024-10-09 20:39:04 -07:00

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings