ollama

History

Jeffrey Morgan 15c2d8fe14 server: parallelize embeddings in API web handler instead of in subprocess runner (#6220 ) For simplicity, perform parallelization of embedding requests in the API handler instead of offloading this to the subprocess runner. This keeps the scheduling story simpler as it builds on existing parallel requests, similar to existing text completion functionality.		2024-08-11 11:57:10 -07:00
..
ext_server	server: parallelize embeddings in API web handler instead of in subprocess runner (#6220 )	2024-08-11 11:57:10 -07:00
generate	Adjust windows ROCm discovery	2024-07-20 15:17:50 -07:00
llama.cpp@1e6f6554aa	update llama.cpp submodule to `1e6f6554` (#6208 )	2024-08-06 15:11:45 -04:00
patches	update llama.cpp submodule to `1e6f6554` (#6208 )	2024-08-06 15:11:45 -04:00
filetype.go	Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322 )	2024-05-23 13:21:49 -07:00
ggla.go	update convert test to check result data	2024-07-31 10:59:38 -07:00
ggml.go	update convert test to check result data	2024-07-31 10:59:38 -07:00
ggml_test.go	llm: speed up gguf decoding by a lot (#5246 )	2024-06-24 21:47:52 -07:00
gguf.go	comments	2024-07-31 15:58:55 -07:00
llm.go	lint	2024-08-01 17:06:06 -07:00
llm_darwin_amd64.go	Enable windows error dialog for subprocess startup	2024-07-22 14:07:27 -07:00
llm_darwin_arm64.go	Enable windows error dialog for subprocess startup	2024-07-22 14:07:27 -07:00
llm_linux.go	Enable windows error dialog for subprocess startup	2024-07-22 14:07:27 -07:00
llm_windows.go	Enable windows error dialog for subprocess startup	2024-07-22 14:07:27 -07:00
memory.go	handle asymmetric embedding KVs	2024-06-20 09:57:27 -07:00
memory_test.go	lint	2024-08-01 17:06:06 -07:00
payload.go	Fix corner cases on tmp cleaner on mac	2024-07-03 13:10:14 -07:00
server.go	server: parallelize embeddings in API web handler instead of in subprocess runner (#6220 )	2024-08-11 11:57:10 -07:00
status.go	Catch one more error log	2024-08-05 09:28:07 -07:00