ollama

Author	SHA1	Message	Date
Jeffrey Morgan	fa2f095bd9	fix model name returned by `/api/generate` being different than the model name provided	2023-12-10 11:42:15 -05:00
Jeffrey Morgan	d9a250e9b5	seek to end of file when decoding older model formats	2023-12-09 21:14:35 -05:00
Jeffrey Morgan	944519ed16	seek to eof for older model binaries	2023-12-09 20:48:57 -05:00
Jeffrey Morgan	2dd040d04c	do not use `--parallel 2` for old runners	2023-12-09 20:17:33 -05:00
Bruce MacDonald	bbe41ce41a	fix: parallel queueing race condition caused silent failure (#1445 ) * fix: queued request failures - increase parallel requests to 2 to complete queued request, queueing is managed in ollama * log steam errors	2023-12-09 14:14:02 -05:00
Michael Yang	f1b049fed8	Merge pull request #1377 from jmorganca/mxyng/qwen update for qwen	2023-12-06 12:31:51 -08:00
Michael Yang	b9495ea162	load projectors	2023-12-05 14:36:12 -08:00
Michael Yang	409bb9674e	Merge pull request #1308 from jmorganca/mxyng/split-from split from into one or more models	2023-12-05 14:33:03 -08:00
Michael Yang	d3479c07a1	Merge pull request #1250 from jmorganca/mxyng/create-layer refactor layer creation	2023-12-05 14:32:52 -08:00
Bruce MacDonald	195e3d9dbd	chat api endpoint (#1392 )	2023-12-05 14:57:33 -05:00
Jeffrey Morgan	00d06619a1	Revert "chat api (#991 )" while context variable is fixed This reverts commit `7a0899d62d`.	2023-12-04 21:16:27 -08:00
Michael Yang	5a5dca13b2	comments	2023-12-04 16:59:23 -08:00
Michael Yang	72e7a49aa9	seek instead of copyn	2023-12-04 16:59:23 -08:00
Michael Yang	2cb0fa7d40	split from into one or more models	2023-12-04 16:59:23 -08:00
Michael Yang	b2816bca67	unnecessary ReadSeeker for DecodeGGML	2023-12-04 16:59:23 -08:00
Bruce MacDonald	7a0899d62d	chat api (#991 ) - update chat docs - add messages chat endpoint - remove deprecated context and template generate parameters from docs - context and template are still supported for the time being and will continue to work as expected - add partial response to chat history	2023-12-04 18:01:06 -05:00
Michael Yang	6deebf2489	update for qwen	2023-12-04 11:38:05 -08:00
Jeffrey Morgan	16a9006306	add back `f16c` instructions on intel mac	2023-11-26 15:59:49 -05:00
Jeffrey Morgan	9e4a316405	update submodule commit	2023-11-26 14:52:00 -05:00
Jing Zhang	82b9b329ff	windows CUDA support (#1262 ) * Support cuda build in Windows * Enable dynamic NumGPU allocation for Windows	2023-11-24 17:16:36 -05:00
Jongwook Choi	12e8c12d2b	Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 ) When CUDA peer access is enabled, multi-gpu inference will produce garbage output. This is a known bug of llama.cpp (or nvidia). Until the upstream bug is fixed, we can disable CUDA peer access temporarily to ensure correct output. See #961.	2023-11-24 14:05:57 -05:00
Jeffrey Morgan	d77dde126b	consistent cpu instructions on macos and linux	2023-11-22 16:26:46 -05:00
Michael Yang	199941cd15	fix: gguf int type	2023-11-22 11:40:30 -08:00
Michael Yang	a00fac4ec8	update llama.cpp	2023-11-21 09:50:02 -08:00
Jeffrey Morgan	a3fcecf943	only set `main_gpu` if value > 0 is provided	2023-11-20 19:54:04 -05:00
Michael Yang	19b7a4d715	recent llama.cpp update added kernels for fp32, q5_0, and q5_1	2023-11-20 13:44:31 -08:00
Purinda Gunasekara	be61a81758	main-gpu argument is not getting passed to llamacpp, fixed. (#1192 )	2023-11-20 10:52:52 -05:00
Jeffrey Morgan	13ba6df5ab	enable cpu instructions on intel macs	2023-11-19 23:20:26 -05:00
Jeffrey Morgan	36a3bbf65f	Update llm/llama.go	2023-11-18 21:25:07 -05:00
Bruce MacDonald	43a726149d	fix potentially inaccurate error message	2023-11-18 21:25:07 -05:00
Jeffrey Morgan	41434a7cdc	build intel mac with correct binary and compile flags	2023-11-16 22:14:51 -05:00
Jeffrey Morgan	5cba29b9d6	JSON mode: add `"format" as an api parameter (#1051 ) * add `"format": "json"` as an API parameter --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2023-11-09 16:44:02 -08:00
Bruce MacDonald	1ae84bc2a2	skip gpu if less than 2GB VRAM are available (#1059 )	2023-11-09 13:16:16 -08:00
Michael Yang	c5e1bbabda	instead of static number of parameters for each model family, get the real number from the tensors (#1022 ) * parse tensor info * refactor decoder * return actual parameter count * explicit rounding * s/Human/HumanNumber/	2023-11-08 17:55:46 -08:00
Jeffrey Morgan	c44b619428	remove unused `fmt.Println`	2023-11-03 17:24:58 -07:00
Jeffrey Morgan	17678b7225	Restore system prompt on requests and default `num_keep` to `0`	2023-11-03 13:25:25 -07:00
Jeffrey Morgan	2e53704685	default rope params to 0 for new models (#968 )	2023-11-02 08:41:30 -07:00
Michael Yang	642128b75a	append LD_LIBRARY_PATH	2023-10-31 15:54:49 -07:00
Jeffrey Morgan	3a1ed9ff70	restore building runner with `AVX` on by default (#900 )	2023-10-27 12:13:44 -07:00
Bruce MacDonald	6d283882b1	catch insufficient permissions nvidia err (#934 )	2023-10-27 12:42:40 -04:00
Bruce MacDonald	2665f3c28e	offload 75% of available vram to improve stability (#921 )	2023-10-26 20:49:55 -04:00
Jeffrey Morgan	b0c9cd0f3b	fix metal assertion errors	2023-10-24 00:32:36 -07:00
Jeffrey Morgan	77f61c6301	update submodule commit	2023-10-24 00:30:27 -07:00
Jeffrey Morgan	f3604534e5	update submodule commit	2023-10-23 23:59:12 -07:00
Michael Yang	0c7a00a264	bump submodules pin to 9e70cc03229df19ca2d28ce23cc817198f897278 for now since 438c2ca83045a00ef244093d27e9ed41a8cb4ea9 is breaking	2023-10-23 11:17:59 -07:00
Michael Yang	36c160f1c3	Merge pull request #881 from jmorganca/mxyng/ggufv3 ggufv3	2023-10-23 10:50:45 -07:00
Michael Yang	c9167494cb	update default log target	2023-10-23 10:44:50 -07:00
Michael Yang	125d0a013a	ggufv3 ggufv3 adds support for big endianness, mainly for s390x architecture. while that's not currently supported for ollama, the change is simple. loosen version check to be more forward compatible. unless specified, gguf versions other v1 will be decoded into v2.	2023-10-23 09:35:49 -07:00
Jeffrey Morgan	7ed5a39bc7	simpler check for model loading compatibility errors	2023-10-19 14:50:49 -04:00
Jeffrey Morgan	a7dad24d92	add error for `falcon` and `starcoder` vocab compatibility (#844 ) add error for falcon and starcoder vocab compatibility --------- Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>	2023-10-19 12:18:31 -04:00

1 2 3

141 commits