08f1e18965
* select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> |
||
---|---|---|
.. | ||
ext_server | ||
generate | ||
llama.cpp@328b83de23 | ||
dynamic_shim.c | ||
dynamic_shim.h | ||
ext_server_common.go | ||
ext_server_default.go | ||
ext_server_windows.go | ||
ggml.go | ||
gguf.go | ||
llama.go | ||
llm.go | ||
shim_darwin.go | ||
shim_ext_server.go | ||
shim_ext_server_linux.go | ||
shim_ext_server_windows.go | ||
utils.go |