ollama/llm
Daniel Hiltgen 171796791f Adjust mmap logic for cuda windows for faster model load
On Windows, recent llama.cpp changes make mmap slower in most
cases, so default to off.  This also implements a tri-state for
use_mmap so we can detect the difference between a user provided
value of true/false, or unspecified.
2024-06-17 16:54:30 -07:00
..
ext_server Fix server.cpp for the new cuda build macros 2024-06-14 14:51:40 -07:00
generate Add back lower level parallel flags 2024-06-17 13:44:46 -07:00
llama.cpp@7c26775adb llm: update llama.cpp commit to 7c26775 (#4896) 2024-06-17 15:56:16 -04:00
patches llm: update llama.cpp commit to 7c26775 (#4896) 2024-06-17 15:56:16 -04:00
filetype.go Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322) 2024-05-23 13:21:49 -07:00
ggla.go simplify safetensors reading 2024-05-21 11:28:22 -07:00
ggml.go Improve multi-gpu handling at the limit 2024-06-14 14:51:40 -07:00
gguf.go Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" 2024-06-11 15:56:17 -07:00
llm.go revert tokenize ffi (#4761) 2024-05-31 18:54:21 -07:00
llm_darwin_amd64.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_darwin_arm64.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_linux.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00
llm_windows.go Move nested payloads to installer and zip file on windows 2024-04-23 16:14:47 -07:00
memory.go Remove mmap related output calc logic 2024-06-14 14:55:50 -07:00
memory_test.go review comments and coverage 2024-06-14 14:55:50 -07:00
payload.go review comments and coverage 2024-06-14 14:55:50 -07:00
server.go Adjust mmap logic for cuda windows for faster model load 2024-06-17 16:54:30 -07:00
status.go Switch back to subprocessing for llama.cpp 2024-04-01 16:48:18 -07:00