ollama

Author	SHA1	Message	Date
Jeffrey Morgan	c336693f07	calculate overhead based number of gpu devices (#1875 )	2024-01-09 15:53:33 -05:00
Daniel Hiltgen	1961a81f03	Set corret CUDA minimum compute capability version If you attempt to run the current CUDA build on compute capability 5.2 cards, you'll hit the following failure: cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported	2024-01-09 11:28:24 -08:00
Jeffrey Morgan	6df83e6daa	update rough cuda overhead estimate to 15% + 384MiB	2024-01-09 13:51:08 -05:00
Jeffrey Morgan	6164f378f2	revert cuda overhead to 20%	2024-01-09 00:54:29 -05:00
Jeffrey Morgan	6566387ae3	add `TODO` for cuda overhead	2024-01-09 00:28:03 -05:00
Jeffrey Morgan	37708931fb	update cuda overhead to 20% to fix crashes when switching between models and large context sizes	2024-01-09 00:05:23 -05:00
Jeffrey Morgan	f6cb0a553c	update cuda overhead to 15% or 400MiB	2024-01-08 23:45:45 -05:00
Jeffrey Morgan	2680078c13	fix build on linux	2024-01-08 23:44:13 -05:00
Jeffrey Morgan	f1b7e5f560	update overhead to 15%	2024-01-08 23:37:45 -05:00
Jeffrey Morgan	cb534e6ac2	use 10% vram overhead for cuda	2024-01-08 23:17:44 -05:00
Jeffrey Morgan	08f1e18965	Offload layers to GPU based on new model size estimates (#1850 ) * select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2024-01-08 16:42:00 -05:00
Daniel Hiltgen	d74ce6bd4f	Detect very old CUDA GPUs and fall back to CPU If we try to load the CUDA library on an old GPU, it panics and crashes the server. This checks the compute capability before we load the library so we can gracefully fall back to CPU mode.	2024-01-06 21:40:29 -08:00
Daniel Hiltgen	a2ad952440	Fix windows system memory lookup This refines the gpu package error handling and fixes a bug with the system memory lookup on windows.	2024-01-03 08:50:01 -08:00
Daniel Hiltgen	d966b730ac	Switch windows build to fully dynamic Refactor where we store build outputs, and support a fully dynamic loading model on windows so the base executable has no special dependencies thus doesn't require a special PATH.	2024-01-02 15:36:16 -08:00
Daniel Hiltgen	7555ea44f8	Revamp the dynamic library shim This switches the default llama.cpp to be CPU based, and builds the GPU variants as dynamically loaded libraries which we can select at runtime. This also bumps the ROCm library to version 6 given 5.7 builds don't work on the latest ROCm library that just shipped.	2023-12-20 14:45:57 -08:00
Daniel Hiltgen	1b991d0ba9	Refine build to support CPU only If someone checks out the ollama repo and doesn't install the CUDA library, this will ensure they can build a CPU only version	2023-12-19 09:05:46 -08:00
Daniel Hiltgen	35934b2e05	Adapted rocm support to cgo based llama.cpp	2023-12-19 09:05:46 -08:00

17 commits