ollama/llm
Daniel Hiltgen 7784ca33ce Tighten up memory prediction logging
Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.
2024-06-18 09:15:35 -07:00
..
ext_server Fix server.cpp for the new cuda build macros 2024-06-14 14:51:40 -07:00
generate Add back lower level parallel flags 2024-06-17 13:44:46 -07:00
llama.cpp@7c26775adb llm: update llama.cpp commit to 7c26775 (#4896) 2024-06-17 15:56:16 -04:00
patches llm: update llama.cpp commit to 7c26775 (#4896) 2024-06-17 15:56:16 -04:00
filetype.go Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) 2024-05-23 13:21:49 -07:00
ggla.go simplify safetensors reading 2024-05-21 11:28:22 -07:00
ggml.go Improve multi-gpu handling at the limit 2024-06-14 14:51:40 -07:00
gguf.go Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" 2024-06-11 15:56:17 -07:00
llm.go revert tokenize ffi (#4761) 2024-05-31 18:54:21 -07:00
llm_darwin_amd64.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_darwin_arm64.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_linux.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_windows.go Move nested payloads to installer and zip file on windows 2024-04-23 16:14:47 -07:00
memory.go Tighten up memory prediction logging 2024-06-18 09:15:35 -07:00
memory_test.go review comments and coverage 2024-06-14 14:55:50 -07:00
payload.go review comments and coverage 2024-06-14 14:55:50 -07:00
server.go Tighten up memory prediction logging 2024-06-18 09:15:35 -07:00
status.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00