ollama/llama/runner
Jesse Gross c25ffde91d runner.go: Don't trim whitespace from inputs
It's possible to get prompts that consist entirely of whitespace -
this is most likely to happen when generating embeddings. Currently,
we will trim this away, leaving an empty prompt, which will then
generate an error.

Generating embeddings from whitespace should not trigger an error,
as this may break pipelines. It's better to just leave the whitespace
in place and process what we are given. This is consistent with
past versions of Ollama.

Bug #7578
2024-11-14 11:23:06 -08:00
..
cache.go runner.go: Make KV entry accounting more robust 2024-11-11 20:23:03 -08:00
cache_test.go runner.go: Better abstract vision model integration 2024-10-30 14:53:43 -07:00
image.go runner.go: Check for zero length images 2024-11-08 09:39:32 -08:00
image_test.go runner.go: Better abstract vision model integration 2024-10-30 14:53:43 -07:00
README.md Re-introduce the llama package (#5034) 2024-10-08 08:53:54 -07:00
requirements.go Re-introduce the llama package (#5034) 2024-10-08 08:53:54 -07:00
runner.go runner.go: Don't trim whitespace from inputs 2024-11-14 11:23:06 -08:00
stop.go runner.go: Handle truncation of tokens for stop sequences 2024-10-09 20:39:04 -07:00
stop_test.go runner.go: Handle truncation of tokens for stop sequences 2024-10-09 20:39:04 -07:00

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings