Andrei Betlen
310fbf4e49
Update llama.cpp
2024-02-05 22:07:14 -05:00
Andrei Betlen
59760c85ed
fix: Use llama_log_callback to avoid suppress_stdout_stderr
2024-02-05 21:52:12 -05:00
Andrei Betlen
3553b14670
Update llama.cpp
2024-02-05 13:26:50 -05:00
Andrei
7467f129e5
Revert "Fix: fileno error google colab ( #729 ) ( #1156 )" ( #1157 )
...
This reverts commit bebfba0f08
.
2024-02-02 12:18:55 -05:00
Dulsara
bebfba0f08
Fix: fileno error google colab ( #729 ) ( #1156 )
...
Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.
2024-02-02 12:05:46 -05:00
Andrei Betlen
8a5911bd5d
Update llama.cpp
2024-02-02 09:41:27 -05:00
Andrei Betlen
de526d0214
Update llama.cpp
2024-02-01 12:35:31 -05:00
Andrei Betlen
3322eadbf3
Bump version
2024-01-31 15:10:18 -05:00
Andrei Betlen
a8cb34eacd
Update llama.cpp
2024-01-31 15:05:51 -05:00
Andrei
fb762a6041
Add speculative decoding ( #1120 )
...
* Add draft model param to llama class, implement basic prompt lookup decoding draft model
* Use samplingcontext for sampling
* Use 1d array
* Use draft model for sampling
* Fix dumb mistake
* Allow for later extensions to the LlamaDraftModel api
* Cleanup
* Adaptive candidate prediction
* Update implementation to match hf transformers
* Tuning
* Fix bug where last token was not used for ngram prediction
* Remove heuristic for num_pred_tokens (no benefit)
* fix: n_candidates bug.
* Add draft_model_num_pred_tokens server setting
* Cleanup
* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
71e3e4c435
Update llama.cpp
2024-01-31 10:41:42 -05:00
Andrei Betlen
2b37d8e438
fix: Run server command. Closes #1143
2024-01-31 10:37:19 -05:00
Andrei Betlen
078cca0361
fix: Pass raise_exception and add_generation_prompt to jinja2 chat template
2024-01-31 08:42:21 -05:00
Andrei Betlen
411494706a
Update llama.cpp
2024-01-31 08:35:21 -05:00
Andrei Betlen
bf9e824922
Bump version
2024-01-30 12:27:27 -05:00
Andrei Betlen
247a16de66
docs: Update README
2024-01-30 12:23:07 -05:00
Andrei Betlen
13b7ced7da
Update llama.cpp
2024-01-30 12:21:41 -05:00
Andrei Betlen
011cd84ded
Update llama.cpp
2024-01-30 09:48:09 -05:00
Andrei
da003d8768
Automatically set chat format from gguf ( #1110 )
...
* Use jinja formatter to load chat format from gguf
* Fix off-by-one error in metadata loader
* Implement chat format auto-detection
2024-01-29 14:22:23 -05:00
Andrei Betlen
059f6b3ac8
docs: fix typos
2024-01-29 11:02:25 -05:00
Andrei Betlen
843e77e3e2
docs: Add Vulkan build instructions
2024-01-29 11:01:26 -05:00
Andrei Betlen
464af5b39f
Bump version
2024-01-29 10:46:04 -05:00
Andrei Betlen
9f7852acfa
misc: Add vulkan target
2024-01-29 10:39:23 -05:00
Andrei Betlen
85f8c4c06e
Update llama.cpp
2024-01-29 10:39:08 -05:00
Andrei Betlen
9ae5819ee4
Add chat format test.
2024-01-29 00:59:01 -05:00
Rafaelblsilva
ce38dbdf07
Add mistral instruct chat format as "mistral-instruct" ( #799 )
...
* Added mistral instruct chat format as "mistral"
* Fix stop sequence (merge issue)
* Update chat format name to `mistral-instruct`
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-29 00:34:42 -05:00
Andrei Betlen
52c4a84faf
Bump version
2024-01-28 19:35:37 -05:00
Andrei Betlen
31e0288a41
Update llama.cpp
2024-01-28 19:34:27 -05:00
Andrei Betlen
ccf4908bfd
Update llama.cpp
2024-01-28 12:55:32 -05:00
Andrei Betlen
8c59210062
docs: Fix typo
2024-01-27 19:37:59 -05:00
Andrei Betlen
399fa1e03b
docs: Add JSON and JSON schema mode examples to README
2024-01-27 19:36:33 -05:00
Andrei Betlen
c1d0fff8a9
Bump version
2024-01-27 18:36:56 -05:00
Andrei
d8f6914f45
Add json schema mode ( #1122 )
...
* Add json schema mode
* Add llava chat format support
2024-01-27 16:52:18 -05:00
Andrei Betlen
c6d3bd62e8
Update llama.cpp
2024-01-27 16:22:46 -05:00
Andrei Betlen
35918873b4
Update llama.cpp
2024-01-26 11:45:48 -05:00
Andrei Betlen
f5cc6b3053
Bump version
2024-01-25 11:28:16 -05:00
Andrei Betlen
cde7514c3d
feat(server): include llama-cpp-python version in openapi spec
2024-01-25 11:23:18 -05:00
Andrei Betlen
2588f34a22
Update llama.cpp
2024-01-25 11:22:42 -05:00
Andrei Betlen
dc5a436224
Update llama.cpp
2024-01-25 11:19:34 -05:00
Andrei Betlen
d6fb16e055
docs: Update README
2024-01-25 10:51:48 -05:00
Andrei Betlen
5b258bf840
docs: Update README with more param common examples
2024-01-24 10:51:15 -05:00
Andrei Betlen
c343baaba8
Update llama.cpp
2024-01-24 10:40:50 -05:00
Andrei Betlen
c970d41a85
fix: llama_log_set should be able to accept null pointer
2024-01-24 10:38:30 -05:00
Andrei Betlen
9677a1f2c8
fix: Check order
2024-01-23 22:28:03 -05:00
Andrei Betlen
4d6b2f7b91
fix: format
2024-01-23 22:08:27 -05:00
Phil H
fe5d6ea648
fix: GGUF metadata KV overrides, re #1011 ( #1116 )
...
* kv overrides another attempt
* add sentinel element, simplify array population
* ensure sentinel element is zeroed
2024-01-23 22:00:38 -05:00
Andrei Betlen
7e63928bc9
Update llama.cpp
2024-01-23 18:42:39 -05:00
Andrei Betlen
fcdf337d84
Update llama.cpp
2024-01-22 11:25:11 -05:00
Andrei Betlen
5b982d0f8c
fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.
2024-01-22 08:32:48 -05:00
Andrei Betlen
2ce0b8aa2c
Bump version
2024-01-21 20:30:24 -05:00