ollama/llm/ext_server
Jeffrey Morgan 15c2d8fe14
server: parallelize embeddings in API web handler instead of in subprocess runner (#6220)
For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.
2024-08-11 11:57:10 -07:00
..
CMakeLists.txt line feed 2024-08-04 17:25:41 -07:00
httplib.h Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
json.hpp Import server.cpp as of b2356 2024-03-12 13:58:06 -07:00
server.cpp server: parallelize embeddings in API web handler instead of in subprocess runner (#6220) 2024-08-11 11:57:10 -07:00
utils.hpp log clean up 2024-05-09 14:55:36 -07:00