ollama/gpu
Jeffrey Morgan b24e8d17b2
Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896)
* increase minimum cuda overhead and fix minimum overhead for multi-gpu

* fix multi gpu overhead

* limit overhead to 10% of all gpus

* better wording

* allocate fixed amount before layers

* fixed only includes graph alloc
2024-01-10 19:08:51 -05:00
..
gpu.go Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896) 2024-01-10 19:08:51 -05:00
gpu_darwin.go calculate overhead based number of gpu devices (#1875) 2024-01-09 15:53:33 -05:00
gpu_info.h calculate overhead based number of gpu devices (#1875) 2024-01-09 15:53:33 -05:00
gpu_info_cpu.c calculate overhead based number of gpu devices (#1875) 2024-01-09 15:53:33 -05:00
gpu_info_cuda.c Harden GPU mgmt library lookup 2024-01-10 15:06:41 -08:00
gpu_info_cuda.h Harden GPU mgmt library lookup 2024-01-10 15:06:41 -08:00
gpu_info_rocm.c Harden GPU mgmt library lookup 2024-01-10 15:06:41 -08:00
gpu_info_rocm.h Harden GPU mgmt library lookup 2024-01-10 15:06:41 -08:00
gpu_test.go calculate overhead based number of gpu devices (#1875) 2024-01-09 15:53:33 -05:00
types.go calculate overhead based number of gpu devices (#1875) 2024-01-09 15:53:33 -05:00