ollama

Author	SHA1	Message	Date
Jongwook Choi	12e8c12d2b	Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 ) When CUDA peer access is enabled, multi-gpu inference will produce garbage output. This is a known bug of llama.cpp (or nvidia). Until the upstream bug is fixed, we can disable CUDA peer access temporarily to ensure correct output. See #961.	2023-11-24 14:05:57 -05:00
Jeffrey Morgan	3a1ed9ff70	restore building runner with `AVX` on by default (#900 )	2023-10-27 12:13:44 -07:00
Michael Yang	c9167494cb	update default log target	2023-10-23 10:44:50 -07:00
Bruce MacDonald	5d22319a2c	rename server subprocess (#700 ) - this makes it easier to see that the subprocess is associated with ollama	2023-10-06 10:15:42 -04:00
Michael Yang	058d0cd04b	silence warm up log	2023-09-21 14:53:33 -07:00
Michael Yang	6c6a31a1e8	embed libraries using cmake	2023-09-20 14:41:57 -07:00
Bruce MacDonald	fc6ec356fc	remove libcuda.so	2023-09-20 20:36:14 +01:00
Bruce MacDonald	1255bc9b45	only package 11.8 runner	2023-09-20 20:00:41 +01:00
Bruce MacDonald	b9bb5ca288	use cuda_version	2023-09-20 17:58:16 +01:00
Bruce MacDonald	4e8be787c7	pack in cuda libs	2023-09-20 17:40:42 +01:00
Bruce MacDonald	2540c9181c	support for packaging in multiple cuda runners (#509 ) * enable packaging multiple cuda versions * use nvcc cuda version if available --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-14 15:08:13 -04:00
Bruce MacDonald	f59c4d03f7	fix ggml arm64 cuda build (#520 )	2023-09-12 17:06:48 -04:00
Bruce MacDonald	f221637053	first pass at linux gpu support (#454 ) * linux gpu support * handle multiple gpus * add cuda docker image (#488) --------- Co-authored-by: Michael Yang <mxyng@pm.me>	2023-09-12 11:04:35 -04:00

13 commits