Andrei
fb762a6041
Add speculative decoding ( #1120 )
...
* Add draft model param to llama class, implement basic prompt lookup decoding draft model
* Use samplingcontext for sampling
* Use 1d array
* Use draft model for sampling
* Fix dumb mistake
* Allow for later extensions to the LlamaDraftModel api
* Cleanup
* Adaptive candidate prediction
* Update implementation to match hf transformers
* Tuning
* Fix bug where last token was not used for ngram prediction
* Remove heuristic for num_pred_tokens (no benefit)
* fix: n_candidates bug.
* Add draft_model_num_pred_tokens server setting
* Cleanup
* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
71e3e4c435
Update llama.cpp
2024-01-31 10:41:42 -05:00
Andrei Betlen
2b37d8e438
fix: Run server command. Closes #1143
2024-01-31 10:37:19 -05:00
Andrei Betlen
078cca0361
fix: Pass raise_exception and add_generation_prompt to jinja2 chat template
2024-01-31 08:42:21 -05:00
Andrei Betlen
411494706a
Update llama.cpp
2024-01-31 08:35:21 -05:00
Andrei Betlen
bf9e824922
Bump version
2024-01-30 12:27:27 -05:00
Andrei Betlen
247a16de66
docs: Update README
2024-01-30 12:23:07 -05:00
Andrei Betlen
13b7ced7da
Update llama.cpp
2024-01-30 12:21:41 -05:00
Andrei Betlen
011cd84ded
Update llama.cpp
2024-01-30 09:48:09 -05:00
Andrei
da003d8768
Automatically set chat format from gguf ( #1110 )
...
* Use jinja formatter to load chat format from gguf
* Fix off-by-one error in metadata loader
* Implement chat format auto-detection
2024-01-29 14:22:23 -05:00
Andrei Betlen
059f6b3ac8
docs: fix typos
2024-01-29 11:02:25 -05:00
Andrei Betlen
843e77e3e2
docs: Add Vulkan build instructions
2024-01-29 11:01:26 -05:00
Andrei Betlen
464af5b39f
Bump version
2024-01-29 10:46:04 -05:00
Andrei Betlen
9f7852acfa
misc: Add vulkan target
2024-01-29 10:39:23 -05:00
Andrei Betlen
85f8c4c06e
Update llama.cpp
2024-01-29 10:39:08 -05:00
Andrei Betlen
9ae5819ee4
Add chat format test.
2024-01-29 00:59:01 -05:00
Rafaelblsilva
ce38dbdf07
Add mistral instruct chat format as "mistral-instruct" ( #799 )
...
* Added mistral instruct chat format as "mistral"
* Fix stop sequence (merge issue)
* Update chat format name to `mistral-instruct`
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-29 00:34:42 -05:00
Andrei Betlen
52c4a84faf
Bump version
2024-01-28 19:35:37 -05:00
Andrei Betlen
31e0288a41
Update llama.cpp
2024-01-28 19:34:27 -05:00
Andrei Betlen
ccf4908bfd
Update llama.cpp
2024-01-28 12:55:32 -05:00
Andrei Betlen
8c59210062
docs: Fix typo
2024-01-27 19:37:59 -05:00
Andrei Betlen
399fa1e03b
docs: Add JSON and JSON schema mode examples to README
2024-01-27 19:36:33 -05:00
Andrei Betlen
c1d0fff8a9
Bump version
2024-01-27 18:36:56 -05:00
Andrei
d8f6914f45
Add json schema mode ( #1122 )
...
* Add json schema mode
* Add llava chat format support
2024-01-27 16:52:18 -05:00
Andrei Betlen
c6d3bd62e8
Update llama.cpp
2024-01-27 16:22:46 -05:00
Andrei Betlen
35918873b4
Update llama.cpp
2024-01-26 11:45:48 -05:00
Andrei Betlen
f5cc6b3053
Bump version
2024-01-25 11:28:16 -05:00
Andrei Betlen
cde7514c3d
feat(server): include llama-cpp-python version in openapi spec
2024-01-25 11:23:18 -05:00
Andrei Betlen
2588f34a22
Update llama.cpp
2024-01-25 11:22:42 -05:00
Andrei Betlen
dc5a436224
Update llama.cpp
2024-01-25 11:19:34 -05:00
Andrei Betlen
d6fb16e055
docs: Update README
2024-01-25 10:51:48 -05:00
Andrei Betlen
5b258bf840
docs: Update README with more param common examples
2024-01-24 10:51:15 -05:00
Andrei Betlen
c343baaba8
Update llama.cpp
2024-01-24 10:40:50 -05:00
Andrei Betlen
c970d41a85
fix: llama_log_set should be able to accept null pointer
2024-01-24 10:38:30 -05:00
Andrei Betlen
9677a1f2c8
fix: Check order
2024-01-23 22:28:03 -05:00
Andrei Betlen
4d6b2f7b91
fix: format
2024-01-23 22:08:27 -05:00
Phil H
fe5d6ea648
fix: GGUF metadata KV overrides, re #1011 ( #1116 )
...
* kv overrides another attempt
* add sentinel element, simplify array population
* ensure sentinel element is zeroed
2024-01-23 22:00:38 -05:00
Andrei Betlen
7e63928bc9
Update llama.cpp
2024-01-23 18:42:39 -05:00
Andrei Betlen
fcdf337d84
Update llama.cpp
2024-01-22 11:25:11 -05:00
Andrei Betlen
5b982d0f8c
fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.
2024-01-22 08:32:48 -05:00
Andrei Betlen
2ce0b8aa2c
Bump version
2024-01-21 20:30:24 -05:00
Andrei Betlen
d3f5528ca8
fix: from_json_schema oneof/anyof bug. Closes #1097
2024-01-21 19:06:53 -05:00
Andrei Betlen
8eefdbca03
Update llama.cpp
2024-01-21 19:01:27 -05:00
Andrei Betlen
88fbccaaa3
docs: Add macosx wrong arch fix to README
2024-01-21 18:38:44 -05:00
Andrei Betlen
24f39454e9
fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats.
2024-01-21 18:38:04 -05:00
Andrei Betlen
7f3209b1eb
feat: Add add_generation_prompt option for jinja2chatformatter.
2024-01-21 18:37:24 -05:00
Andrei Betlen
ac2e96d4b4
Update llama.cpp
2024-01-19 15:33:43 -05:00
Andrei Betlen
be09318c26
feat: Add Jinja2ChatFormatter
2024-01-19 15:04:42 -05:00
Andrei Betlen
5a34c57e54
feat: Expose gguf model metadata in metadata property
2024-01-19 10:46:03 -05:00
Andrei Betlen
833a7f1a86
Bump version
2024-01-19 09:03:35 -05:00