ollama/llm
Jeffrey Morgan 15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
..
ext_server server: parallelize embeddings in API web handler instead of in subprocess runner (#6220) 2024-08-11 11:57:10 -07:00
generate Adjust windows ROCm discovery 2024-07-20 15:17:50 -07:00
llama.cpp@1e6f6554aa update llama.cpp submodule to 1e6f6554 (#6208) 2024-08-06 15:11:45 -04:00
patches update llama.cpp submodule to 1e6f6554 (#6208) 2024-08-06 15:11:45 -04:00
filetype.go Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) 2024-05-23 13:21:49 -07:00
ggla.go update convert test to check result data 2024-07-31 10:59:38 -07:00
ggml.go update convert test to check result data 2024-07-31 10:59:38 -07:00
ggml_test.go llm: speed up gguf decoding by a lot (#5246) 2024-06-24 21:47:52 -07:00
gguf.go comments 2024-07-31 15:58:55 -07:00
llm.go lint 2024-08-01 17:06:06 -07:00
llm_darwin_amd64.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
llm_darwin_arm64.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
llm_linux.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
llm_windows.go Enable windows error dialog for subprocess startup 2024-07-22 14:07:27 -07:00
memory.go handle asymmetric embedding KVs 2024-06-20 09:57:27 -07:00
memory_test.go lint 2024-08-01 17:06:06 -07:00
payload.go Fix corner cases on tmp cleaner on mac 2024-07-03 13:10:14 -07:00
server.go server: parallelize embeddings in API web handler instead of in subprocess runner (#6220) 2024-08-11 11:57:10 -07:00
status.go Catch one more error log 2024-08-05 09:28:07 -07:00