ollama/llm
Daniel Hiltgen 795674dd90 Bump llama.cpp to b1842 and add new cuda lib dep
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
..
ext_server Disable mmap with lora layers (#1985) 2024-01-13 23:36:31 -05:00
generate Bump llama.cpp to b1842 and add new cuda lib dep 2024-01-16 12:53:52 -08:00
llama.cpp@584d674be6 Bump llama.cpp to b1842 and add new cuda lib dep 2024-01-16 12:53:52 -08:00
dyn_ext_server.c Always dynamically load the llm server library 2024-01-11 08:42:47 -08:00
dyn_ext_server.go do not cache prompt (#2018) 2024-01-16 13:48:05 -05:00
dyn_ext_server.h Always dynamically load the llm server library 2024-01-11 08:42:47 -08:00
ggml.go add max context length check 2024-01-12 14:54:07 -08:00
gguf.go add max context length check 2024-01-12 14:54:07 -08:00
llama.go remove unused fields and functions 2024-01-09 09:37:40 -08:00
llm.go add max context length check 2024-01-12 14:54:07 -08:00
payload_common.go Merge pull request #1935 from dhiltgen/cpu_fallback 2024-01-11 15:52:32 -08:00
payload_darwin.go Always dynamically load the llm server library 2024-01-11 08:42:47 -08:00
payload_linux.go Always dynamically load the llm server library 2024-01-11 08:42:47 -08:00
payload_test.go Fix up the CPU fallback selection 2024-01-11 15:27:06 -08:00
payload_windows.go Always dynamically load the llm server library 2024-01-11 08:42:47 -08:00
utils.go partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00