ollama

Author	SHA1	Message	Date
Fabian Preiß	3bc8b9832b	fix gpu_test.go Error (same type) uint64->uint32 (#1921 )	2024-01-11 08:22:23 -05:00
Jeffrey Morgan	ab6be852c7	revisit memory allocation to account for full kv cache on main gpu	2024-01-11 01:45:31 -05:00
Jeffrey Morgan	b24e8d17b2	Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896 ) * increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc	2024-01-10 19:08:51 -05:00
Jeffrey Morgan	f83881390f	revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1`	2024-01-10 18:42:39 -05:00
Daniel Hiltgen	ac70ab6761	Merge pull request #1914 from dhiltgen/smarter_cuda_detection Smarter GPU Management library detection	2024-01-10 15:21:56 -08:00
Daniel Hiltgen	3c49c3ab0d	Harden GPU mgmt library lookup When there are multiple management libraries installed on a system not every one will be compatible with the current driver. This change improves our management library algorithm to build up a set of discovered libraries based on glob patterns, and then try all of them until we're able to load one without error.	2024-01-10 15:06:41 -08:00
Daniel Hiltgen	9754ae4c89	Support optional override of the target archictures This can help speed up incremental builds when you're only testing one archicture, like amd64. E.g. BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:	2024-01-10 14:43:24 -08:00
Jeffrey Morgan	224fbf2795	update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed	2024-01-10 17:03:15 -05:00
Jeffrey Morgan	2c6e8f5248	Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885 ) * update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` * unblock condition variable in `update_slots` when closing server	2024-01-10 16:48:38 -05:00
Jeffrey Morgan	34344d801c	clean up cmake `build` directory when cross compiling macOS builds	2024-01-09 17:13:56 -05:00
Robin Glauser	e868c8a5c7	Update api.md (#1878 ) Fixed assistant in the example response.	2024-01-09 16:21:17 -05:00
Jeffrey Morgan	c336693f07	calculate overhead based number of gpu devices (#1875 )	2024-01-09 15:53:33 -05:00
Daniel Hiltgen	e89dc1d54b	Merge pull request #1874 from dhiltgen/correct_cuda_min Set corret CUDA minimum compute capability version	2024-01-09 11:37:22 -08:00
Daniel Hiltgen	1961a81f03	Set corret CUDA minimum compute capability version If you attempt to run the current CUDA build on compute capability 5.2 cards, you'll hit the following failure: cuBLAS error 15 at ggml-cuda.cu:7956: the requested functionality is not supported	2024-01-09 11:28:24 -08:00
Jeffrey Morgan	8a8c7e7f8d	only build for metal on `arm64`	2024-01-09 13:51:08 -05:00
Jeffrey Morgan	6df83e6daa	update rough cuda overhead estimate to 15% + 384MiB	2024-01-09 13:51:08 -05:00
Michael Yang	62023177f6	Merge pull request #1614 from jmorganca/mxyng/fix-set-template fix: set template without triple quotes	2024-01-09 09:36:24 -08:00
Jeffrey Morgan	6164f378f2	revert cuda overhead to 20%	2024-01-09 00:54:29 -05:00
Jeffrey Morgan	f387e9631b	use runner if cuda alloc won't fit	2024-01-09 00:44:34 -05:00
Jeffrey Morgan	6566387ae3	add `TODO` for cuda overhead	2024-01-09 00:28:03 -05:00
Jeffrey Morgan	37708931fb	update cuda overhead to 20% to fix crashes when switching between models and large context sizes	2024-01-09 00:05:23 -05:00
Jeffrey Morgan	f6cb0a553c	update cuda overhead to 15% or 400MiB	2024-01-08 23:45:45 -05:00
Jeffrey Morgan	2680078c13	fix build on linux	2024-01-08 23:44:13 -05:00
Jeffrey Morgan	f1b7e5f560	update overhead to 15%	2024-01-08 23:37:45 -05:00
Jeffrey Morgan	cb534e6ac2	use 10% vram overhead for cuda	2024-01-08 23:17:44 -05:00
Jeffrey Morgan	58ce2d8273	better estimate scratch buffer size	2024-01-08 21:32:44 -05:00
Jeffrey Morgan	18ddf6d57d	fix windows build	2024-01-08 20:04:01 -05:00
Michael Yang	61e6502449	Merge pull request #1818 from jmorganca/mxyng/fix-alt-prompt fix(cmd): history in alt prompt	2024-01-08 13:48:34 -08:00
Jeffrey Morgan	08f1e18965	Offload layers to GPU based on new model size estimates (#1850 ) * select layers based on estimated model memory usage * always account for scratch vram * dont load +1 layers * better estmation for graph alloc * Update gpu/gpu_darwin.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * Update llm/llm.go * add overhead for cuda memory * Update llm/llm.go Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com> * fix build error on linux * address comments --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2024-01-08 16:42:00 -05:00
Bruce MacDonald	7e8f7c8358	remove ggml automatic re-pull (#1856 )	2024-01-08 14:41:01 -05:00
Bruce MacDonald	3f3eb19a3b	document response in modelfile template variables (#1428 )	2024-01-08 14:38:51 -05:00
Daniel Hiltgen	059ae4585e	Merge pull request #1834 from dhiltgen/old_cuda Detect very old CUDA GPUs and fall back to CPU	2024-01-07 10:39:49 -08:00
Daniel Hiltgen	6347f501ca	Merge pull request #1828 from dhiltgen/fix_llava Accept windows paths for image processing	2024-01-07 09:05:46 -08:00
Jeffrey Morgan	5feec959ad	dont use `-Wall` in static build (#1833 )	2024-01-07 10:39:19 -05:00
Jeffrey Morgan	dbdd50b283	add `-DCMAKE_SYSTEM_NAME=Darwin` cmake flag (#1832 )	2024-01-07 00:46:17 -05:00
Daniel Hiltgen	d74ce6bd4f	Detect very old CUDA GPUs and fall back to CPU If we try to load the CUDA library on an old GPU, it panics and crashes the server. This checks the compute capability before we load the library so we can gracefully fall back to CPU mode.	2024-01-06 21:40:29 -08:00
Guilherme Baptista	57942b4676	Update README.md - Community Integrations - Ollama for Ruby (#1830 )	2024-01-06 22:31:39 -05:00
Daniel Hiltgen	e0d05b0f1e	Accept windows paths for image processing This enhances our regex to support windows style paths. The regex will match invalid path specifications, but we'll still validate file existence and filter out mismatches	2024-01-06 10:50:27 -08:00
Daniel Hiltgen	2d9dd14f27	Merge pull request #1697 from dhiltgen/win_docs Add windows native build instructions	2024-01-05 19:34:20 -08:00
Jeffrey Morgan	1caa56128f	add cuda lib path for nvidia container toolkit	2024-01-05 21:10:37 -05:00
Michael Yang	0101e76dbe	Merge pull request #1797 from sublimator/nd-allow-extension-origins-still-needs-explicit-listing-2024-01-05 fix: allow extension origins (still needs explicit listing), fixes #1686	2024-01-05 17:20:09 -08:00
Michael Yang	2ef9352b94	fix(cmd): history in alt mode	2024-01-05 16:20:02 -08:00
Michael Yang	5580ae2472	fix: set template without triple quotes	2024-01-05 15:51:33 -08:00
Bruce MacDonald	3a9f447141	only pull gguf model if already exists (#1817 )	2024-01-05 18:50:00 -05:00
Patrick Devine	9c2941e61b	switch api for ShowRequest to use the name field (#1816 )	2024-01-05 15:06:43 -08:00
Patrick Devine	238ac5e765	Add unit tests for Parser (#1815 )	2024-01-05 14:04:31 -08:00
Bruce MacDonald	4f4980b66b	simplify ggml update logic (#1814 ) - additional information is now available in show response, use this to pull gguf before running - make gguf updates cancellable	2024-01-05 15:22:32 -05:00
Patrick Devine	22e93efa41	add show info command and fix the modelfile	2024-01-05 12:20:05 -08:00
Patrick Devine	2909dce894	split up interactive generation	2024-01-05 12:20:05 -08:00
Jeffrey Morgan	df32537312	gpu: read memory info from all cuda devices (#1802 ) * gpu: read memory info from all cuda devices * add `LOOKUP_SIZE` constant * better constant name * address comments	2024-01-05 11:25:58 -05:00

1 2 3 4 5 ...

1759 commits