ollama

Author	SHA1	Message	Date
Jeffrey Morgan	f8fedbda20	Update llama.cpp submodule commit to `d94c6e0c` (#5805 )	2024-07-22 12:42:00 -04:00
Jeffrey Morgan	5534f2cc6a	llm: consider `head_dim` in llama arch (#5817 )	2024-07-20 21:48:12 -04:00
Jeffrey Morgan	1475eab95f	add patch for tekken (#5807 )	2024-07-20 13:41:21 -04:00
Jeffrey Morgan	571dc61955	Update llama.cpp submodule to `a8db2a9c` (#5530 )	2024-07-07 13:03:09 -04:00
Jeffrey Morgan	8f8e736b13	update llama.cpp submodule to `d7fd29f` (#5475 )	2024-07-05 13:25:58 -04:00
Jeffrey Morgan	e9188e971a	Fix assert on small embedding inputs (#5491 ) * Fix assert on small embedding inputs * Update llm/patches/09-pooling.diff	2024-07-05 11:20:57 -04:00
Daniel Hiltgen	6298f49816	Fix clip model loading with unicode paths On windows, if the model dir contained unicode characters clip models would fail to load. This fixes the file name handling in clip.cpp to support utf16 on windows.	2024-07-03 12:46:36 -07:00
Jeffrey Morgan	4d311eb731	llm: architecture patch (#5316 )	2024-06-26 21:38:12 -07:00
Jeffrey Morgan	152fc202f5	llm: update llama.cpp commit to `7c26775` (#4896 ) * llm: update llama.cpp submodule to `7c26775` * disable `LLAMA_BLAS` for now * `-DLLAMA_OPENMP=off`	2024-06-17 15:56:16 -04:00
Jeffrey Morgan	ce0dc33cb8	llm: patch to fix qwen 2 temporarily on nvidia (#4897 )	2024-06-06 23:14:33 -07:00
Jeffrey Morgan	22f5c12ced	Update llama.cpp submodule to `5921b8f0` (#4731 ) * update llama.cpp submodule to `5921b8f089d3b7bda86aac5a66825df6a6c10603` * add patch	2024-05-30 16:20:22 -07:00
Michael Yang	714adb8bd1	bump (#4597 )	2024-05-23 14:16:26 -07:00
Daniel Hiltgen	b37b496a12	Wire up load progress This doesn't expose a UX yet, but wires the initial server portion of progress reporting during load	2024-05-23 13:36:48 -07:00
Jeffrey Morgan	583c1f472c	update llama.cpp submodule to `614d3b9` (#4414 )	2024-05-16 13:53:09 -07:00
Jeffrey Morgan	1b0e6c9c0e	Fix llava models not working after first request (#4164 ) * fix llava models not working after first request * individual requests only for llava models	2024-05-05 20:50:31 -07:00
Daniel Hiltgen	85801317d1	Fix clip log import	2024-04-26 09:43:46 -07:00
jmorganca	ddf5c09a9b	use matrix multiplcation kernels in more cases	2024-04-25 13:58:54 -07:00
Daniel Hiltgen	0035e31af8	Bump to b2581	2024-04-02 11:53:07 -07:00
Daniel Hiltgen	43799532c1	Bump llama.cpp to b2474 The release just before ggml-cuda.cu refactoring	2024-03-23 09:54:56 +01:00
Michael Yang	291c663865	fix: clip memory leak	2024-03-14 13:12:42 -07:00
Jeffrey Morgan	e72c567cfd	restore locale patch (#3091 )	2024-03-12 22:08:13 -07:00
Bruce MacDonald	b80661e8c7	relay load model errors to the client (#3065 )	2024-03-11 16:48:27 -04:00
Jeffrey Morgan	369eda65f5	update llama.cpp submodule to `ceca1ae` (#3064 )	2024-03-11 12:57:48 -07:00
Jeffrey Morgan	41b00b9856	fix `03-locale.diff`	2024-03-10 16:21:05 -07:00
Jeffrey Morgan	908005d90b	patch: use default locale in wpm tokenizer (#3034 )	2024-03-09 21:12:12 -08:00
Jeffrey Morgan	1ffb1e2874	update llama.cpp submodule to `77d1ac7` (#3030 )	2024-03-09 15:55:34 -08:00
Jeffrey Morgan	0e4669b04f	update llama.cpp submodule to `6cdabe6` (#2999 )	2024-03-08 00:26:20 -08:00
Jeffrey Morgan	21347e1ed6	update llama.cpp submodule to `c29af7e` (#2868 )	2024-03-01 15:26:04 -08:00
Jeffrey Morgan	4613a080e7	update llama.cpp submodule to `66c1968f7` (#2618 )	2024-02-20 17:42:31 -05:00
Daniel Hiltgen	fc39a6cd7a	Fix cuda leaks This should resolve the problem where we don't fully unload from the GPU when we go idle.	2024-02-18 18:37:20 -08:00
Jeffrey Morgan	26b13fc33c	patch: always add token to cache_tokens (#2459 )	2024-02-12 08:10:16 -08:00
Daniel Hiltgen	de76b95dd4	Bump llama.cpp to b2081	2024-02-06 12:06:43 -08:00
Daniel Hiltgen	72b12c3be7	Bump llama.cpp to b1999 This requires an upstream change to support graceful termination, carried as a patch.	2024-01-30 16:52:12 -08:00
Jeffrey Morgan	a64570dcae	Fix clearing kv cache between requests with the same prompt (#2186 ) * Fix clearing kv cache between requests with the same prompt * fix powershell script	2024-01-25 13:46:20 -08:00

34 commits