ollama

History

Daniel Hiltgen 6fd04ca922 Improve multi-gpu handling at the limit Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block		2024-06-14 14:51:40 -07:00
..
ext_server	Fix server.cpp for the new cuda build macros	2024-06-14 14:51:40 -07:00
generate	Add ability to skip oneapi generate	2024-06-07 08:32:49 -07:00
llama.cpp@5921b8f089	Update llama.cpp submodule to `5921b8f0` (#4731 )	2024-05-30 16:20:22 -07:00
patches	llm: patch to fix qwen 2 temporarily on nvidia (#4897 )	2024-06-06 23:14:33 -07:00
filetype.go	Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL (#4322 )	2024-05-23 13:21:49 -07:00
ggla.go	simplify safetensors reading	2024-05-21 11:28:22 -07:00
ggml.go	Improve multi-gpu handling at the limit	2024-06-14 14:51:40 -07:00
gguf.go	Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order"	2024-06-11 15:56:17 -07:00
llm.go	revert tokenize ffi (#4761 )	2024-05-31 18:54:21 -07:00
llm_darwin_amd64.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_darwin_arm64.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_linux.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00
llm_windows.go	Move nested payloads to installer and zip file on windows	2024-04-23 16:14:47 -07:00
memory.go	Improve multi-gpu handling at the limit	2024-06-14 14:51:40 -07:00
memory_test.go	Improve multi-gpu handling at the limit	2024-06-14 14:51:40 -07:00
payload.go	replace x/exp/slices with slices	2024-06-04 11:13:30 -07:00
server.go	Improve multi-gpu handling at the limit	2024-06-14 14:51:40 -07:00
status.go	Switch back to subprocessing for llama.cpp	2024-04-01 16:48:18 -07:00