ollama/gpu
Jeffrey Morgan 08f1e18965
Offload layers to GPU based on new model size estimates (#1850)
* select layers based on estimated model memory usage

* always account for scratch vram

* dont load +1 layers

* better estmation for graph alloc

* Update gpu/gpu_darwin.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* Update llm/llm.go

* add overhead for cuda memory

* Update llm/llm.go

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>

* fix build error on linux

* address comments

---------

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2024-01-08 16:42:00 -05:00
..
gpu.go Offload layers to GPU based on new model size estimates (#1850) 2024-01-08 16:42:00 -05:00
gpu_darwin.go Offload layers to GPU based on new model size estimates (#1850) 2024-01-08 16:42:00 -05:00
gpu_info.h Fix windows system memory lookup 2024-01-03 08:50:01 -08:00
gpu_info_cpu.c Fix windows system memory lookup 2024-01-03 08:50:01 -08:00
gpu_info_cuda.c Detect very old CUDA GPUs and fall back to CPU 2024-01-06 21:40:29 -08:00
gpu_info_cuda.h Detect very old CUDA GPUs and fall back to CPU 2024-01-06 21:40:29 -08:00
gpu_info_rocm.c Fix windows system memory lookup 2024-01-03 08:50:01 -08:00
gpu_info_rocm.h Adapted rocm support to cgo based llama.cpp 2023-12-19 09:05:46 -08:00
gpu_test.go Fix windows system memory lookup 2024-01-03 08:50:01 -08:00
types.go Fix windows system memory lookup 2024-01-03 08:50:01 -08:00