baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	71e3e4c435	Update llama.cpp	2024-01-31 10:41:42 -05:00
Andrei Betlen	2b37d8e438	fix: Run server command. Closes #1143	2024-01-31 10:37:19 -05:00
Andrei Betlen	078cca0361	fix: Pass raise_exception and add_generation_prompt to jinja2 chat template	2024-01-31 08:42:21 -05:00
Andrei Betlen	411494706a	Update llama.cpp	2024-01-31 08:35:21 -05:00
Andrei Betlen	bf9e824922	Bump version	2024-01-30 12:27:27 -05:00
Andrei Betlen	247a16de66	docs: Update README	2024-01-30 12:23:07 -05:00
Andrei Betlen	13b7ced7da	Update llama.cpp	2024-01-30 12:21:41 -05:00
Andrei Betlen	011cd84ded	Update llama.cpp	2024-01-30 09:48:09 -05:00
Andrei	da003d8768	Automatically set chat format from gguf (#1110 ) * Use jinja formatter to load chat format from gguf * Fix off-by-one error in metadata loader * Implement chat format auto-detection	2024-01-29 14:22:23 -05:00
Andrei Betlen	059f6b3ac8	docs: fix typos	2024-01-29 11:02:25 -05:00
Andrei Betlen	843e77e3e2	docs: Add Vulkan build instructions	2024-01-29 11:01:26 -05:00
Andrei Betlen	464af5b39f	Bump version	2024-01-29 10:46:04 -05:00
Andrei Betlen	9f7852acfa	misc: Add vulkan target	2024-01-29 10:39:23 -05:00
Andrei Betlen	85f8c4c06e	Update llama.cpp	2024-01-29 10:39:08 -05:00
Andrei Betlen	9ae5819ee4	Add chat format test.	2024-01-29 00:59:01 -05:00
Rafaelblsilva	ce38dbdf07	Add mistral instruct chat format as "mistral-instruct" (#799 ) * Added mistral instruct chat format as "mistral" * Fix stop sequence (merge issue) * Update chat format name to `mistral-instruct` --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-01-29 00:34:42 -05:00
Andrei Betlen	52c4a84faf	Bump version	2024-01-28 19:35:37 -05:00
Andrei Betlen	31e0288a41	Update llama.cpp	2024-01-28 19:34:27 -05:00
Andrei Betlen	ccf4908bfd	Update llama.cpp	2024-01-28 12:55:32 -05:00
Andrei Betlen	8c59210062	docs: Fix typo	2024-01-27 19:37:59 -05:00
Andrei Betlen	399fa1e03b	docs: Add JSON and JSON schema mode examples to README	2024-01-27 19:36:33 -05:00
Andrei Betlen	c1d0fff8a9	Bump version	2024-01-27 18:36:56 -05:00
Andrei	d8f6914f45	Add json schema mode (#1122 ) * Add json schema mode * Add llava chat format support	2024-01-27 16:52:18 -05:00
Andrei Betlen	c6d3bd62e8	Update llama.cpp	2024-01-27 16:22:46 -05:00
Andrei Betlen	35918873b4	Update llama.cpp	2024-01-26 11:45:48 -05:00
Andrei Betlen	f5cc6b3053	Bump version	2024-01-25 11:28:16 -05:00
Andrei Betlen	cde7514c3d	feat(server): include llama-cpp-python version in openapi spec	2024-01-25 11:23:18 -05:00
Andrei Betlen	2588f34a22	Update llama.cpp	2024-01-25 11:22:42 -05:00
Andrei Betlen	dc5a436224	Update llama.cpp	2024-01-25 11:19:34 -05:00
Andrei Betlen	d6fb16e055	docs: Update README	2024-01-25 10:51:48 -05:00
Andrei Betlen	5b258bf840	docs: Update README with more param common examples	2024-01-24 10:51:15 -05:00
Andrei Betlen	c343baaba8	Update llama.cpp	2024-01-24 10:40:50 -05:00
Andrei Betlen	c970d41a85	fix: llama_log_set should be able to accept null pointer	2024-01-24 10:38:30 -05:00
Andrei Betlen	9677a1f2c8	fix: Check order	2024-01-23 22:28:03 -05:00
Andrei Betlen	4d6b2f7b91	fix: format	2024-01-23 22:08:27 -05:00
Phil H	fe5d6ea648	fix: GGUF metadata KV overrides, re #1011 (#1116 ) * kv overrides another attempt * add sentinel element, simplify array population * ensure sentinel element is zeroed	2024-01-23 22:00:38 -05:00
Andrei Betlen	7e63928bc9	Update llama.cpp	2024-01-23 18:42:39 -05:00
Andrei Betlen	fcdf337d84	Update llama.cpp	2024-01-22 11:25:11 -05:00
Andrei Betlen	5b982d0f8c	fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.	2024-01-22 08:32:48 -05:00
Andrei Betlen	2ce0b8aa2c	Bump version	2024-01-21 20:30:24 -05:00
Andrei Betlen	d3f5528ca8	fix: from_json_schema oneof/anyof bug. Closes #1097	2024-01-21 19:06:53 -05:00
Andrei Betlen	8eefdbca03	Update llama.cpp	2024-01-21 19:01:27 -05:00
Andrei Betlen	88fbccaaa3	docs: Add macosx wrong arch fix to README	2024-01-21 18:38:44 -05:00
Andrei Betlen	24f39454e9	fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats.	2024-01-21 18:38:04 -05:00
Andrei Betlen	7f3209b1eb	feat: Add add_generation_prompt option for jinja2chatformatter.	2024-01-21 18:37:24 -05:00
Andrei Betlen	ac2e96d4b4	Update llama.cpp	2024-01-19 15:33:43 -05:00
Andrei Betlen	be09318c26	feat: Add Jinja2ChatFormatter	2024-01-19 15:04:42 -05:00
Andrei Betlen	5a34c57e54	feat: Expose gguf model metadata in metadata property	2024-01-19 10:46:03 -05:00
Andrei Betlen	833a7f1a86	Bump version	2024-01-19 09:03:35 -05:00

1 2 3 4 5 ...

1429 commits