baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	310fbf4e49	Update llama.cpp	2024-02-05 22:07:14 -05:00
Andrei Betlen	59760c85ed	fix: Use llama_log_callback to avoid suppress_stdout_stderr	2024-02-05 21:52:12 -05:00
Andrei Betlen	3553b14670	Update llama.cpp	2024-02-05 13:26:50 -05:00
Andrei	7467f129e5	Revert "Fix: fileno error google colab (#729 ) (#1156 )" (#1157 ) This reverts commit `bebfba0f08`.	2024-02-02 12:18:55 -05:00
Dulsara	bebfba0f08	Fix: fileno error google colab (#729 ) (#1156 ) Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.	2024-02-02 12:05:46 -05:00
Andrei Betlen	8a5911bd5d	Update llama.cpp	2024-02-02 09:41:27 -05:00
Andrei Betlen	de526d0214	Update llama.cpp	2024-02-01 12:35:31 -05:00
Andrei Betlen	3322eadbf3	Bump version	2024-01-31 15:10:18 -05:00
Andrei Betlen	a8cb34eacd	Update llama.cpp	2024-01-31 15:05:51 -05:00
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	71e3e4c435	Update llama.cpp	2024-01-31 10:41:42 -05:00
Andrei Betlen	2b37d8e438	fix: Run server command. Closes #1143	2024-01-31 10:37:19 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei Betlen	411494706a	Update llama.cpp	2024-01-31 08:35:21 -05:00
Andrei Betlen	bf9e824922	Bump version	2024-01-30 12:27:27 -05:00
Andrei Betlen	247a16de66	docs: Update README	2024-01-30 12:23:07 -05:00
Andrei Betlen	13b7ced7da	Update llama.cpp	2024-01-30 12:21:41 -05:00
Andrei Betlen	011cd84ded	Update llama.cpp	2024-01-30 09:48:09 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	059f6b3ac8	docs: fix typos	2024-01-29 11:02:25 -05:00
Andrei Betlen	843e77e3e2	docs: Add Vulkan build instructions	2024-01-29 11:01:26 -05:00
Andrei Betlen	464af5b39f	Bump version	2024-01-29 10:46:04 -05:00
Andrei Betlen	9f7852acfa	misc: Add vulkan target	2024-01-29 10:39:23 -05:00
Andrei Betlen	85f8c4c06e	Update llama.cpp	2024-01-29 10:39:08 -05:00
Andrei Betlen	9ae5819ee4	Add chat format test.	2024-01-29 00:59:01 -05:00
Rafaelblsilva	ce38dbdf07	Add mistral instruct chat format as "mistral-instruct" (#799 ) * Added mistral instruct chat format as "mistral" * Fix stop sequence (merge issue) * Update chat format name to `mistral-instruct` --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-29 00:34:42 -05:00
Andrei Betlen	52c4a84faf	Bump version	2024-01-28 19:35:37 -05:00
Andrei Betlen	31e0288a41	Update llama.cpp	2024-01-28 19:34:27 -05:00
Andrei Betlen	ccf4908bfd	Update llama.cpp	2024-01-28 12:55:32 -05:00
Andrei Betlen	8c59210062	docs: Fix typo	2024-01-27 19:37:59 -05:00
Andrei Betlen	399fa1e03b	docs: Add JSON and JSON schema mode examples to README	2024-01-27 19:36:33 -05:00
Andrei Betlen	c1d0fff8a9	Bump version	2024-01-27 18:36:56 -05:00
Andrei	d8f6914f45	Add json schema mode (#1122 ) * Add json schema mode * Add llava chat format support	2024-01-27 16:52:18 -05:00
Andrei Betlen	c6d3bd62e8	Update llama.cpp	2024-01-27 16:22:46 -05:00
Andrei Betlen	35918873b4	Update llama.cpp	2024-01-26 11:45:48 -05:00
Andrei Betlen	f5cc6b3053	Bump version	2024-01-25 11:28:16 -05:00
Andrei Betlen	cde7514c3d	feat(server): include llama-cpp-python version in openapi spec	2024-01-25 11:23:18 -05:00
Andrei Betlen	2588f34a22	Update llama.cpp	2024-01-25 11:22:42 -05:00
Andrei Betlen	dc5a436224	Update llama.cpp	2024-01-25 11:19:34 -05:00
Andrei Betlen	d6fb16e055	docs: Update README	2024-01-25 10:51:48 -05:00
Andrei Betlen	5b258bf840	docs: Update README with more param common examples	2024-01-24 10:51:15 -05:00
Andrei Betlen	c343baaba8	Update llama.cpp	2024-01-24 10:40:50 -05:00
Andrei Betlen	c970d41a85	fix: llama_log_set should be able to accept null pointer	2024-01-24 10:38:30 -05:00
Andrei Betlen	9677a1f2c8	fix: Check order	2024-01-23 22:28:03 -05:00
Andrei Betlen	4d6b2f7b91	fix: format	2024-01-23 22:08:27 -05:00
Phil H	fe5d6ea648	fix: GGUF metadata KV overrides, re #1011 (#1116 ) * kv overrides another attempt * add sentinel element, simplify array population * ensure sentinel element is zeroed	2024-01-23 22:00:38 -05:00
Andrei Betlen	7e63928bc9	Update llama.cpp	2024-01-23 18:42:39 -05:00
Andrei Betlen	fcdf337d84	Update llama.cpp	2024-01-22 11:25:11 -05:00
Andrei Betlen	5b982d0f8c	fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.	2024-01-22 08:32:48 -05:00
Andrei Betlen	2ce0b8aa2c	Bump version	2024-01-21 20:30:24 -05:00

1 2 3 4 5 ...

1488 commits