History

Jesse Gross c4b34f2a2a runner.go: Truncate inputs that exceed context rather than shifting Previous versions of the runner would truncate inputs to the context window before beginning processing. The main processing loop relied on this behavior if the context needed to be shifted later (due to token generation). If truncation did not occur then invariants would be broken, causing crashes or infinite loops. Later versions attempted to fix these bugs and make the logic less subtle so that all inputs could be handled. Truncation was removed to make things consistent. However, truncation is much faster than processing and shifting, so removing it caused performance problems when the input vastly exceeded the context size. This restores the input truncation as a performance optimization while keeping the more robust processing logic. Fixes #7762		2024-11-20 12:49:24 -08:00
..
cache.go	runner.go: Don't add inputs to cache view until actually processed	2024-11-20 12:49:24 -08:00
cache_test.go	runner.go: Better abstract vision model integration	2024-10-30 14:53:43 -07:00
image.go	runner.go: Check for zero length images	2024-11-08 09:39:32 -08:00
image_test.go	runner.go: Better abstract vision model integration	2024-10-30 14:53:43 -07:00
README.md	Re-introduce the `llama` package (#5034 )	2024-10-08 08:53:54 -07:00
requirements.go	Re-introduce the `llama` package (#5034 )	2024-10-08 08:53:54 -07:00
runner.go	runner.go: Truncate inputs that exceed context rather than shifting	2024-11-20 12:49:24 -08:00
stop.go	runner.go: Handle truncation of tokens for stop sequences	2024-10-09 20:39:04 -07:00
stop_test.go	runner.go: Handle truncation of tokens for stop sequences	2024-10-09 20:39:04 -07:00

README.md

`runner`

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embeddings