ollama

Author	SHA1	Message	Date
Jeffrey Morgan	36a3bbf65f	Update llm/llama.go	2023-11-18 21:25:07 -05:00
Bruce MacDonald	43a726149d	fix potentially inaccurate error message	2023-11-18 21:25:07 -05:00
Jeffrey Morgan	41434a7cdc	build intel mac with correct binary and compile flags	2023-11-16 22:14:51 -05:00
Jeffrey Morgan	5cba29b9d6	JSON mode: add `"format" as an api parameter (#1051 ) * add `"format": "json"` as an API parameter --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2023-11-09 16:44:02 -08:00
Bruce MacDonald	1ae84bc2a2	skip gpu if less than 2GB VRAM are available (#1059 )	2023-11-09 13:16:16 -08:00
Michael Yang	c5e1bbabda	instead of static number of parameters for each model family, get the real number from the tensors (#1022 ) * parse tensor info * refactor decoder * return actual parameter count * explicit rounding * s/Human/HumanNumber/	2023-11-08 17:55:46 -08:00
Jeffrey Morgan	c44b619428	remove unused `fmt.Println`	2023-11-03 17:24:58 -07:00
Jeffrey Morgan	17678b7225	Restore system prompt on requests and default `num_keep` to `0`	2023-11-03 13:25:25 -07:00
Jeffrey Morgan	2e53704685	default rope params to 0 for new models (#968 )	2023-11-02 08:41:30 -07:00
Michael Yang	642128b75a	append LD_LIBRARY_PATH	2023-10-31 15:54:49 -07:00
Jeffrey Morgan	3a1ed9ff70	restore building runner with `AVX` on by default (#900 )	2023-10-27 12:13:44 -07:00
Bruce MacDonald	6d283882b1	catch insufficient permissions nvidia err (#934 )	2023-10-27 12:42:40 -04:00
Bruce MacDonald	2665f3c28e	offload 75% of available vram to improve stability (#921 )	2023-10-26 20:49:55 -04:00
Jeffrey Morgan	b0c9cd0f3b	fix metal assertion errors	2023-10-24 00:32:36 -07:00
Jeffrey Morgan	77f61c6301	update submodule commit	2023-10-24 00:30:27 -07:00
Jeffrey Morgan	f3604534e5	update submodule commit	2023-10-23 23:59:12 -07:00
Michael Yang	0c7a00a264	bump submodules pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since 438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking	2023-10-23 11:17:59 -07:00
Michael Yang	36c160f1c3	Merge pull request #881 from jmorganca/mxyng/ggufv3 ggufv3	2023-10-23 10:50:45 -07:00
Michael Yang	c9167494cb	update default log target	2023-10-23 10:44:50 -07:00
Michael Yang	125d0a013a	ggufv3 ggufv3 adds support for big endianness, mainly for s390x architecture. while that's not currently supported for ollama, the change is simple. loosen version check to be more forward compatible. unless specified, gguf versions other v1 will be decoded into v2.	2023-10-23 09:35:49 -07:00
Jeffrey Morgan	7ed5a39bc7	simpler check for model loading compatibility errors	2023-10-19 14:50:49 -04:00
Jeffrey Morgan	a7dad24d92	add error for `falcon` and `starcoder` vocab compatibility (#844 ) add error for falcon and starcoder vocab compatibility --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2023-10-19 12:18:31 -04:00
Michael Yang	235e43d7f6	Merge pull request #833 from discovertomorrow/leadingspace Fix Issue with Leading Whitespaces in Decoded Context	2023-10-18 13:52:48 -07:00
Arne Müller	730996e530	use TrimPrefix instead of TrimLeft	2023-10-18 22:51:30 +02:00
Arne Müller	ce6197a8e0	removed redundant strings.CutPrefix from Decode	2023-10-18 22:47:20 +02:00
Arne Müller	46b9953f32	use strings.TrimLeft to remove spaces	2023-10-18 22:41:19 +02:00
Bruce MacDonald	565648f3f7	relay CUDA errors to the client (#825 )	2023-10-18 15:36:56 -04:00
Arne Müller	90c49bed57	moved removal of leading space into Predict	2023-10-18 20:08:26 +02:00
Arne Müller	5dc0cff459	fix whitespace removal	2023-10-18 08:15:27 +02:00
Michael Yang	08b0e04f40	Merge pull request #813 from jmorganca/mxyng/llama refactor llm/llama.go	2023-10-17 14:05:58 -07:00
Michael Yang	b36b0b71f8	use cut prefix	2023-10-17 14:01:39 -07:00
Michael Yang	094df37563	remove unused struct	2023-10-17 14:01:38 -07:00
Bruce MacDonald	f3648fd206	Update llama.cpp gguf to latest (#710 )	2023-10-17 16:55:16 -04:00
Bruce MacDonald	bd93a94abd	fix MB VRAM log output (#824 )	2023-10-17 15:35:16 -04:00
Michael Yang	f55bdb6f10	Merge pull request #799 from deichbewohner/jsonmarshaling Fix JSON Marshal Escaping for Special Characters	2023-10-17 08:46:02 -07:00
Michael Yang	2870a9bfc8	Merge pull request #812 from jmorganca/mxyng/fix-format-string fix: wrong format string type	2023-10-17 08:40:49 -07:00
Arne Müller	8fa3f366ad	Removed newline trimming and used buffer directly in POST request.	2023-10-17 08:17:35 +02:00
Michael Yang	fddb303f23	fix: format string wrong type	2023-10-16 16:14:28 -07:00
Michael Yang	cb4a80b693	fix: regression unsupported metal types omitting `--n-gpu-layers` means use metal on macos which isn't correct since ollama uses `num_gpu=0` to explicitly disable gpu for file types that are not implemented in metal	2023-10-16 14:37:20 -07:00
Arne Müller	ee94693b1a	handling unescaped json marshaling	2023-10-16 11:15:55 +02:00
Michael Yang	11d82d7b9b	update checkvram	2023-10-13 14:47:29 -07:00
Michael Yang	36fe2deebf	only check system memory on macos	2023-10-13 14:47:29 -07:00
Michael Yang	4a8931f634	check total (system + video) memory	2023-10-13 14:47:29 -07:00
Michael Yang	bd6e38fb1a	refactor memory check	2023-10-13 14:47:29 -07:00
Michael Yang	92189a5855	fix memory check	2023-10-13 14:47:29 -07:00
Michael Yang	d790bf9916	Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading fix: offloading on low end GPUs	2023-10-13 14:36:44 -07:00
Michael Yang	35afac099a	do not use gpu binary when num_gpu == 0	2023-10-13 14:32:12 -07:00
Michael Yang	811c3d1900	no gpu if vram < 2GB	2023-10-13 14:32:12 -07:00
Bruce MacDonald	6fe178134d	improve api error handling (#781 ) - remove new lines from llama.cpp error messages relayed to client - check api option types and return error on wrong type - change num layers from 95% VRAM to 92% VRAM	2023-10-13 16:57:10 -04:00
Bruce MacDonald	56497663c8	relay model runner error message to client (#720 ) * give direction to user when runner fails * also relay errors from timeout * increase timeout to 3 minutes	2023-10-12 11:16:37 -04:00

1 2 3

113 commits