ollama

History

Daniel Hiltgen 171796791f Adjust mmap logic for cuda windows for faster model load On Windows, recent llama.cpp changes make mmap slower in most cases, so default to off. This also implements a tri-state for use_mmap so we can detect the difference between a user provided value of true/false, or unspecified.		2024-06-17 16:54:30 -07:00
..
ext_server	Fix server.cpp for the new cuda build macros	2024-06-14 14:51:40 -07:00
generate	Add back lower level parallel flags	2024-06-17 13:44:46 -07:00
llama.cpp@7c26775adb	llm: update llama.cpp commit to `7c26775` (#4896 )	2024-06-17 15:56:16 -04:00
patches	llm: update llama.cpp commit to `7c26775` (#4896 )	2024-06-17 15:56:16 -04:00
filetype.go	Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322 )	2024-05-23 13:21:49 -07:00
ggla.go	simplify safetensors reading	2024-05-21 11:28:22 -07:00
ggml.go	Improve multi-gpu handling at the limit	2024-06-14 14:51:40 -07:00
gguf.go	Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"	2024-06-11 15:56:17 -07:00
llm.go	revert tokenize ffi (#4761 )	2024-05-31 18:54:21 -07:00
llm_darwin_amd64.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_darwin_arm64.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_linux.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_windows.go	Move nested payloads to installer and zip file on windows	2024-04-23 16:14:47 -07:00
memory.go	Remove mmap related output calc logic	2024-06-14 14:55:50 -07:00
memory_test.go	review comments and coverage	2024-06-14 14:55:50 -07:00
payload.go	review comments and coverage	2024-06-14 14:55:50 -07:00
server.go	Adjust mmap logic for cuda windows for faster model load	2024-06-17 16:54:30 -07:00
status.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00