ollama

History

Daniel Hiltgen 7784ca33ce Tighten up memory prediction logging Prior to this change, we logged the memory prediction multiple times as the scheduler iterates to find a suitable configuration, which can be confusing since only the last log before the server starts is actually valid. This now logs once just before starting the server on the final configuration. It also reports what library instead of always saying "offloading to gpu" when using CPU.		2024-06-18 09:15:35 -07:00
..
ext_server	Fix server.cpp for the new cuda build macros	2024-06-14 14:51:40 -07:00
generate	Add back lower level parallel flags	2024-06-17 13:44:46 -07:00
llama.cpp@7c26775adb	llm: update llama.cpp commit to `7c26775` (#4896 )	2024-06-17 15:56:16 -04:00
patches	llm: update llama.cpp commit to `7c26775` (#4896 )	2024-06-17 15:56:16 -04:00
filetype.go	Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322 )	2024-05-23 13:21:49 -07:00
ggla.go	simplify safetensors reading	2024-05-21 11:28:22 -07:00
ggml.go	Improve multi-gpu handling at the limit	2024-06-14 14:51:40 -07:00
gguf.go	Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"	2024-06-11 15:56:17 -07:00
llm.go	revert tokenize ffi (#4761 )	2024-05-31 18:54:21 -07:00
llm_darwin_amd64.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_darwin_arm64.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_linux.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_windows.go	Move nested payloads to installer and zip file on windows	2024-04-23 16:14:47 -07:00
memory.go	Tighten up memory prediction logging	2024-06-18 09:15:35 -07:00
memory_test.go	review comments and coverage	2024-06-14 14:55:50 -07:00
payload.go	review comments and coverage	2024-06-14 14:55:50 -07:00
server.go	Tighten up memory prediction logging	2024-06-18 09:15:35 -07:00
status.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00