ollama

Author	SHA1	Message	Date
Michael Yang	598d6d5572	Merge pull request #1937 from jmorganca/mxyng/remove-client-py remove client.py	2024-01-16 11:01:41 -08:00
Bruce MacDonald	a897e833b8	do not cache prompt (#2018 ) - prompt cache causes inferance to hang after some time	2024-01-16 13:48:05 -05:00
Patrick Devine	eef50accb4	Fix show parameters (#2017 )	2024-01-16 10:34:44 -08:00
Michael Yang	05d53de7a1	Merge pull request #1968 from jmorganca/mxyng/fix-request-retry fix: request retry with error	2024-01-16 10:33:50 -08:00
Daniel Hiltgen	8795447dad	Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection improve cuda detection (rel. issue #1704)	2024-01-14 18:00:11 -08:00
Daniel Hiltgen	b3035112a1	Add macos cross-compile CI coverage	2024-01-14 10:38:59 -08:00
Daniel Hiltgen	95ad9a9fc8	Merge pull request #1988 from dhiltgen/fix_intel_mac Fix typo in arm mac arch script	2024-01-14 08:45:18 -08:00
Daniel Hiltgen	3ca5f69ce8	Fix typo in arm mac arch script	2024-01-14 08:32:57 -08:00
Daniel Hiltgen	cfa6337960	Merge pull request #1982 from dhiltgen/fix_intel_mac Fix intel mac build	2024-01-14 08:26:46 -08:00
Alexander F. Rødseth	f4bf1d514f	Let gpu.go and gen_linux.sh also find CUDA on Arch Linux	2024-01-14 13:40:36 +01:00
Jeffrey Morgan	557110d0ba	Disable `mmap` with lora layers (#1985 )	2024-01-13 23:36:31 -05:00
Daniel Hiltgen	2ecb247276	Fix intel mac build Make sure we're building an x86 ext_server lib when cross-compiling	2024-01-13 14:46:34 -08:00
Jeffrey Morgan	288ef8ff95	add `gcc -lstdc++` flag for linux cpu (#1974 )	2024-01-13 03:53:00 -05:00
Jeffrey Morgan	4cf17990f7	use g++ to build `libext_server.so` on linux (#1972 )	2024-01-13 03:12:42 -05:00
Michael Yang	27331ae3a8	download: add inactivity monitor if a download part is inactive for some time, restart it	2024-01-12 15:23:15 -08:00
Michael Yang	b6c0ef1e70	Merge pull request #1961 from jmorganca/mxyng/rm-double-newline remove double newlines in /set parameter	2024-01-12 15:18:19 -08:00
Michael Yang	356d178f6e	Merge pull request #1971 from jmorganca/mxyng/max-context-length add max context length check	2024-01-12 15:10:25 -08:00
Michael Yang	eaed6f8c45	add max context length check	2024-01-12 14:54:07 -08:00
purificant	6a5bfc2ed6	update actions/setup-go	2024-01-12 22:27:25 +00:00
Michael Yang	cf29bd2d72	fix: request retry with error this fixes a subtle bug with makeRequestWithRetry where an HTTP status error on a retried request will potentially not return the right err	2024-01-12 13:32:27 -08:00
Fabian Preiss	905862e17b	improve cuda detection (rel. issue #1704 )	2024-01-12 21:59:19 +01:00
Patrick Devine	565f8a3c44	Convert the REPL to use /api/chat for interactive responses (#1936 )	2024-01-12 12:05:52 -08:00
Michael Yang	5121b7ac9c	remove double newlines in /set parameter	2024-01-12 11:21:15 -08:00
Michael Yang	a70262c6b2	Update README.md Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>	2024-01-12 09:43:04 -08:00
Tristram Oaten	40a0a90a88	Add group delete to uninstall instructions (#1924 ) After executing the `userdel ollama` command, I saw this message: ```sh $ sudo userdel ollama userdel: group ollama not removed because it has other members. ``` Which reminded me that I had to remove the dangling group too. For completeness, the uninstall instructions should do this too. Thanks!	2024-01-12 00:07:00 -05:00
Michael Yang	cbe20c4375	update readme	2024-01-11 16:24:37 -08:00
Michael Yang	5ffbbea1d7	remove client.py	2024-01-11 15:53:10 -08:00
Daniel Hiltgen	3773fb6465	Merge pull request #1935 from dhiltgen/cpu_fallback Fix up the CPU fallback selection	2024-01-11 15:52:32 -08:00
Daniel Hiltgen	7427fa1387	Fix up the CPU fallback selection The memory changes and multi-variant change had some merge glitches I missed. This fixes them so we actually get the cpu llm lib and best variant for the given system.	2024-01-11 15:27:06 -08:00
Michael Yang	f84537e0e0	Merge pull request #1934 from jmorganca/mxyng/fix-slices fix build and lint	2024-01-11 14:36:20 -08:00
Michael Yang	d2be6387c9	fix typo	2024-01-11 14:25:21 -08:00
Michael Yang	d7af35d3d0	import fmt	2024-01-11 14:22:32 -08:00
Michael Yang	defc1dbd6e	use x/exp/slices	2024-01-11 14:20:13 -08:00
Daniel Hiltgen	de2fbdec99	Merge pull request #1819 from dhiltgen/multi_variant Support multiple LLM libs; ROCm v5 and v6; Rosetta, AVX, and AVX2 compatible CPU builds	2024-01-11 14:00:48 -08:00
Eduard van Valkenburg	f5faf79aa1	Add semantic kernel to Readme (#1931 )	2024-01-11 14:40:23 -05:00
Michael Yang	f4f939de28	Merge pull request #1552 from jmorganca/mxyng/lint-test add lint and test on pull_request	2024-01-11 09:37:45 -08:00
Daniel Hiltgen	39928a42e8	Always dynamically load the llm server library This switches darwin to dynamic loading, and refactors the code now that no static linking of the library is used on any platform	2024-01-11 08:42:47 -08:00
Daniel Hiltgen	d88c527be3	Build multiple CPU variants and pick the best This reduces the built-in linux version to not use any vector extensions which enables the resulting builds to run under Rosetta on MacOS in Docker. Then at runtime it checks for the actual CPU vector extensions and loads the best CPU library available	2024-01-11 08:42:47 -08:00
Fabian Preiß	3bc8b9832b	fix gpu_test.go Error (same type) uint64->uint32 (#1921 )	2024-01-11 08:22:23 -05:00
Jeffrey Morgan	ab6be852c7	revisit memory allocation to account for full kv cache on main gpu	2024-01-11 01:45:31 -05:00
Daniel Hiltgen	052b33b81b	DRY out the Dockefile.build	2024-01-10 17:27:51 -08:00
Daniel Hiltgen	8da7bef05f	Support multiple variants for a given llm lib type In some cases we may want multiple variants for a given GPU type or CPU. This adds logic to have an optional Variant which we can use to select an optimal library, but also allows us to try multiple variants in case some fail to load. This can be useful for scenarios such as ROCm v5 vs v6 incompatibility or potentially CPU features.	2024-01-10 17:27:51 -08:00
Jeffrey Morgan	b24e8d17b2	Increase minimum CUDA memory allocation overhead and fix minimum overhead for multi-gpu (#1896 ) * increase minimum cuda overhead and fix minimum overhead for multi-gpu * fix multi gpu overhead * limit overhead to 10% of all gpus * better wording * allocate fixed amount before layers * fixed only includes graph alloc	2024-01-10 19:08:51 -05:00
Jeffrey Morgan	f83881390f	revert submodule back to `328b83de23b33240e28f4e74900d1d06726f5eb1`	2024-01-10 18:42:39 -05:00
Daniel Hiltgen	ac70ab6761	Merge pull request #1914 from dhiltgen/smarter_cuda_detection Smarter GPU Management library detection	2024-01-10 15:21:56 -08:00
Daniel Hiltgen	3c49c3ab0d	Harden GPU mgmt library lookup When there are multiple management libraries installed on a system not every one will be compatible with the current driver. This change improves our management library algorithm to build up a set of discovered libraries based on glob patterns, and then try all of them until we're able to load one without error.	2024-01-10 15:06:41 -08:00
Daniel Hiltgen	9754ae4c89	Support optional override of the target archictures This can help speed up incremental builds when you're only testing one archicture, like amd64. E.g. BUILD_ARCH=amd64 ./scripts/build_linux.sh && scp ./dist/ollama-linux-amd64 test-system:	2024-01-10 14:43:24 -08:00
Jeffrey Morgan	224fbf2795	update submodule to commit `1fc2f265ff9377a37fd2c61eae9cd813a3491bea` until its main branch is fixed	2024-01-10 17:03:15 -05:00
Jeffrey Morgan	2c6e8f5248	Update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` (#1885 ) * update submodule to `6efb8eb30e7025b168f3fda3ff83b9b386428ad6` * unblock condition variable in `update_slots` when closing server	2024-01-10 16:48:38 -05:00
Jeffrey Morgan	34344d801c	clean up cmake `build` directory when cross compiling macOS builds	2024-01-09 17:13:56 -05:00

... 21 22 23 24 25 ...

2907 commits