ollama/llama/runner
Jesse Gross d875e99e46 runner.go: Propagate panics back to the user.
This is a partial revert of 8a35bb92
"runner.go: Increase survivability of main processing loop", removing
the panic handler.

Although we want to avoid errors taking down the runner, we also
should make the user aware of problems when they happen. In the
future, we can restructure things so both parts are true.
2024-11-15 11:52:25 -08:00
..
cache.go runner.go: Make KV entry accounting more robust 2024-11-11 20:23:03 -08:00
cache_test.go runner.go: Better abstract vision model integration 2024-10-30 14:53:43 -07:00
image.go runner.go: Check for zero length images 2024-11-08 09:39:32 -08:00
image_test.go runner.go: Better abstract vision model integration 2024-10-30 14:53:43 -07:00
README.md Re-introduce the llama package (#5034) 2024-10-08 08:53:54 -07:00
requirements.go Re-introduce the llama package (#5034) 2024-10-08 08:53:54 -07:00
runner.go runner.go: Propagate panics back to the user. 2024-11-15 11:52:25 -08:00
stop.go runner.go: Handle truncation of tokens for stop sequences 2024-10-09 20:39:04 -07:00
stop_test.go runner.go: Handle truncation of tokens for stop sequences 2024-10-09 20:39:04 -07:00

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings