15c2d8fe14
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality. |
||
---|---|---|
.. | ||
ext_server | ||
generate | ||
llama.cpp@1e6f6554aa | ||
patches | ||
filetype.go | ||
ggla.go | ||
ggml.go | ||
ggml_test.go | ||
gguf.go | ||
llm.go | ||
llm_darwin_amd64.go | ||
llm_darwin_arm64.go | ||
llm_linux.go | ||
llm_windows.go | ||
memory.go | ||
memory_test.go | ||
payload.go | ||
server.go | ||
status.go |