ollama

Author	SHA1	Message	Date
Michael Yang	2870a9bfc8	Merge pull request #812 from jmorganca/mxyng/fix-format-string fix: wrong format string type	2023-10-17 08:40:49 -07:00
Michael Yang	fddb303f23	fix: format string wrong type	2023-10-16 16:14:28 -07:00
Michael Yang	cb4a80b693	fix: regression unsupported metal types omitting `--n-gpu-layers` means use metal on macos which isn't correct since ollama uses `num_gpu=0` to explicitly disable gpu for file types that are not implemented in metal	2023-10-16 14:37:20 -07:00
Michael Yang	11d82d7b9b	update checkvram	2023-10-13 14:47:29 -07:00
Michael Yang	36fe2deebf	only check system memory on macos	2023-10-13 14:47:29 -07:00
Michael Yang	4a8931f634	check total (system + video) memory	2023-10-13 14:47:29 -07:00
Michael Yang	bd6e38fb1a	refactor memory check	2023-10-13 14:47:29 -07:00
Michael Yang	92189a5855	fix memory check	2023-10-13 14:47:29 -07:00
Michael Yang	d790bf9916	Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading fix: offloading on low end GPUs	2023-10-13 14:36:44 -07:00
Michael Yang	35afac099a	do not use gpu binary when num_gpu == 0	2023-10-13 14:32:12 -07:00
Michael Yang	811c3d1900	no gpu if vram < 2GB	2023-10-13 14:32:12 -07:00
Bruce MacDonald	6fe178134d	improve api error handling (#781 ) - remove new lines from llama.cpp error messages relayed to client - check api option types and return error on wrong type - change num layers from 95% VRAM to 92% VRAM	2023-10-13 16:57:10 -04:00
Bruce MacDonald	56497663c8	relay model runner error message to client (#720 ) * give direction to user when runner fails * also relay errors from timeout * increase timeout to 3 minutes	2023-10-12 11:16:37 -04:00
Michael Yang	b599946b74	add format bytes	2023-10-11 14:08:23 -07:00
Bruce MacDonald	77295f716e	prevent waiting on exited command (#752 ) * prevent waiting on exited command * close llama runner once	2023-10-11 12:32:13 -04:00
Bruce MacDonald	f2ba1311aa	improve vram safety with 5% vram memory buffer (#724 ) * check free memory not total * wait for subprocess to exit	2023-10-10 16:16:09 -04:00
Jeffrey Morgan	ab0668293c	llm: fix build on `amd64`	2023-10-06 14:39:54 -07:00
Bruce MacDonald	5d22319a2c	rename server subprocess (#700 ) - this makes it easier to see that the subprocess is associated with ollama	2023-10-06 10:15:42 -04:00
Bruce MacDonald	d06bc0cb6e	enable q8, q5, 5_1, and f32 for linux gpu (#699 )	2023-10-05 12:53:47 -04:00
Bruce MacDonald	9e2de1bd2c	increase streaming buffer size (#692 )	2023-10-04 14:09:00 -04:00
Michael Yang	c02c0cd483	starcoder	2023-10-02 19:56:51 -07:00
Bruce MacDonald	b1f7123301	clean up num_gpu calculation code (#673 )	2023-10-02 14:53:42 -04:00
Bruce MacDonald	1fbf3585d6	Relay default values to llama runner (#672 ) * include seed in params for llama.cpp server and remove empty filter for temp * relay default predict options to llama.cpp - reorganize options to match predict request for readability * omit empty stop --------- Co-authored-by: hallh <hallh@users.noreply.github.com>	2023-10-02 14:53:16 -04:00
Bruce MacDonald	9771b1ec51	windows runner fixes (#637 )	2023-09-29 11:47:55 -04:00
Michael Yang	f40b3de758	use int64 consistently	2023-09-28 11:07:24 -07:00
Bruce MacDonald	86279f4ae3	unbound max num gpu layers (#591 ) --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-25 18:36:46 -04:00
Michael Yang	058d0cd04b	silence warm up log	2023-09-21 14:53:33 -07:00
Michael Yang	ee1c994d15	update submodule (#567 )	2023-09-21 16:22:23 -04:00
Bruce MacDonald	4cba75efc5	remove tmp directories created by previous servers (#559 ) * remove tmp directories created by previous servers * clean up on server stop * Update routes.go * Update server/routes.go Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> * create top-level temp ollama dir * check file exists before creating --------- Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com> Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-21 20:38:49 +01:00
Michael Yang	a9ed7cc6aa	rename generate.go	2023-09-20 14:42:17 -07:00
Michael Yang	6c6a31a1e8	embed libraries using cmake	2023-09-20 14:41:57 -07:00
Bruce MacDonald	fc6ec356fc	remove libcuda.so	2023-09-20 20:36:14 +01:00
Bruce MacDonald	1255bc9b45	only package 11.8 runner	2023-09-20 20:00:41 +01:00
Bruce MacDonald	b9bb5ca288	use cuda_version	2023-09-20 17:58:16 +01:00
Bruce MacDonald	4e8be787c7	pack in cuda libs	2023-09-20 17:40:42 +01:00
Bruce MacDonald	66003e1d05	subprocess improvements (#524 ) * subprocess improvements - increase start-up timeout - when runner fails to start fail rather than timing out - try runners in order rather than choosing 1 runner - embed metal runner in metal dir rather than gpu - refactor logging and error messages * Update llama.go * Update llama.go * simplify by using glob	2023-09-18 15:16:32 -04:00
Bruce MacDonald	2540c9181c	support for packaging in multiple cuda runners (#509 ) * enable packaging multiple cuda versions * use nvcc cuda version if available --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-14 15:08:13 -04:00
Michael Yang	d028853879	fix: add falcon.go	2023-09-13 14:47:37 -07:00
Michael Yang	949553db23	Merge pull request #519 from jmorganca/mxyng/decode Mxyng/decode	2023-09-13 12:43:57 -07:00
Michael Yang	0c5a454361	fix model type for 70b	2023-09-12 15:12:59 -07:00
Bruce MacDonald	f59c4d03f7	fix ggml arm64 cuda build (#520 )	2023-09-12 17:06:48 -04:00
Michael Yang	7dee25a07f	fix falcon decode get model and file type from bin file	2023-09-12 12:34:53 -07:00
Bruce MacDonald	f221637053	first pass at linux gpu support (#454 ) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-12 11:04:35 -04:00
Bruce MacDonald	09dd2aeff9	GGUF support (#441 )	2023-09-07 13:55:37 -04:00
Jeffrey Morgan	61dda6a5e0	set minimum `CMAKE_OSX_DEPLOYMENT_TARGET` to 11.0	2023-09-06 19:56:50 -04:00
Jeffrey Morgan	7de300856b	use `osPath` in gpu check	2023-09-05 21:52:21 -04:00
Jeffrey Morgan	213ffdb548	macos `amd64` compatibility fixes	2023-09-05 21:33:31 -04:00
Bruce MacDonald	d18282bfda	metal: add missing barriers for mul-mat (#469 )	2023-09-05 19:37:13 -04:00
Michael Yang	2bc06565c7	fix empty response	2023-09-05 15:03:24 -07:00
Michael Yang	7b5aefb427	Merge pull request #462 from jmorganca/mxyng/rm-marshal-prompt remove marshalPrompt which is no longer needed	2023-09-05 11:48:41 -07:00

1 2

76 commits