Akarshan Biswas
918ff27e50
docs: Set the correct command for compiling with syscl support ( #1172 )
2024-02-11 13:55:15 -05:00
Douglas Hanley
19b55ad3e5
feat: use gpu backend for clip if available ( #1175 )
2024-02-11 13:53:59 -05:00
Andrei Betlen
63b0c37836
Update llama.cpp
2024-02-09 13:36:58 -05:00
Andrei Betlen
4abb8c9386
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2024-02-09 13:32:31 -05:00
Andrei Betlen
e16f06e6eb
fix: revert _create_completions.
2024-02-09 02:02:13 -05:00
Andrei Betlen
dfc1b17341
Update llama.cpp
2024-02-08 23:38:12 -05:00
Andrei Betlen
5b4ad6c80b
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-02-08 23:34:45 -05:00
Andrei Betlen
85d3374b4d
fix: broken import
2024-02-08 01:13:28 -05:00
Andrei Betlen
b5fca911b5
feat: Move tokenizer to own module
2024-02-08 01:08:18 -05:00
Andrei Betlen
2ef7ba3aed
misc: rename grammar test
2024-02-08 01:07:44 -05:00
Jeffrey Fong
901827013b
feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class ( #1078 )
...
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* set up parallel function calling
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* resolve merge conflict
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* Cleanup PR, fix breaking changes
* Use hf_pretrained_model_name_or_path for tokenizer
* fix hf tokenizer in streaming
* update README
* refactor offset mapping
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-07 20:07:03 -05:00
Andrei Betlen
ce12775490
Update llama.cpp
2024-02-06 18:50:56 -05:00
Andrei Betlen
34f31040f6
Bump version
2024-02-06 12:47:59 -05:00
Andrei Betlen
5e3e67af47
Update llama.cpp
2024-02-06 12:44:07 -05:00
Andrei Betlen
310fbf4e49
Update llama.cpp
2024-02-05 22:07:14 -05:00
Andrei Betlen
59760c85ed
fix: Use llama_log_callback to avoid suppress_stdout_stderr
2024-02-05 21:52:12 -05:00
Andrei Betlen
3553b14670
Update llama.cpp
2024-02-05 13:26:50 -05:00
Andrei
7467f129e5
Revert "Fix: fileno error google colab ( #729 ) ( #1156 )" ( #1157 )
...
This reverts commit bebfba0f08
.
2024-02-02 12:18:55 -05:00
Dulsara
bebfba0f08
Fix: fileno error google colab ( #729 ) ( #1156 )
...
Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.
2024-02-02 12:05:46 -05:00
Andrei Betlen
8a5911bd5d
Update llama.cpp
2024-02-02 09:41:27 -05:00
Andrei Betlen
de526d0214
Update llama.cpp
2024-02-01 12:35:31 -05:00
Andrei Betlen
3322eadbf3
Bump version
2024-01-31 15:10:18 -05:00
Andrei Betlen
a8cb34eacd
Update llama.cpp
2024-01-31 15:05:51 -05:00
Andrei
fb762a6041
Add speculative decoding ( #1120 )
...
* Add draft model param to llama class, implement basic prompt lookup decoding draft model
* Use samplingcontext for sampling
* Use 1d array
* Use draft model for sampling
* Fix dumb mistake
* Allow for later extensions to the LlamaDraftModel api
* Cleanup
* Adaptive candidate prediction
* Update implementation to match hf transformers
* Tuning
* Fix bug where last token was not used for ngram prediction
* Remove heuristic for num_pred_tokens (no benefit)
* fix: n_candidates bug.
* Add draft_model_num_pred_tokens server setting
* Cleanup
* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
71e3e4c435
Update llama.cpp
2024-01-31 10:41:42 -05:00
Andrei Betlen
2b37d8e438
fix: Run server command. Closes #1143
2024-01-31 10:37:19 -05:00
Andrei Betlen
078cca0361
fix: Pass raise_exception and add_generation_prompt to jinja2 chat template
2024-01-31 08:42:21 -05:00
Andrei Betlen
411494706a
Update llama.cpp
2024-01-31 08:35:21 -05:00
Andrei Betlen
bf9e824922
Bump version
2024-01-30 12:27:27 -05:00
Andrei Betlen
247a16de66
docs: Update README
2024-01-30 12:23:07 -05:00
Andrei Betlen
13b7ced7da
Update llama.cpp
2024-01-30 12:21:41 -05:00
Andrei Betlen
011cd84ded
Update llama.cpp
2024-01-30 09:48:09 -05:00
Andrei
da003d8768
Automatically set chat format from gguf ( #1110 )
...
* Use jinja formatter to load chat format from gguf
* Fix off-by-one error in metadata loader
* Implement chat format auto-detection
2024-01-29 14:22:23 -05:00
Andrei Betlen
059f6b3ac8
docs: fix typos
2024-01-29 11:02:25 -05:00
Andrei Betlen
843e77e3e2
docs: Add Vulkan build instructions
2024-01-29 11:01:26 -05:00
Andrei Betlen
464af5b39f
Bump version
2024-01-29 10:46:04 -05:00
Andrei Betlen
9f7852acfa
misc: Add vulkan target
2024-01-29 10:39:23 -05:00
Andrei Betlen
85f8c4c06e
Update llama.cpp
2024-01-29 10:39:08 -05:00
Andrei Betlen
9ae5819ee4
Add chat format test.
2024-01-29 00:59:01 -05:00
Rafaelblsilva
ce38dbdf07
Add mistral instruct chat format as "mistral-instruct" ( #799 )
...
* Added mistral instruct chat format as "mistral"
* Fix stop sequence (merge issue)
* Update chat format name to `mistral-instruct`
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-29 00:34:42 -05:00
Andrei Betlen
52c4a84faf
Bump version
2024-01-28 19:35:37 -05:00
Andrei Betlen
31e0288a41
Update llama.cpp
2024-01-28 19:34:27 -05:00
Andrei Betlen
ccf4908bfd
Update llama.cpp
2024-01-28 12:55:32 -05:00
Andrei Betlen
8c59210062
docs: Fix typo
2024-01-27 19:37:59 -05:00
Andrei Betlen
399fa1e03b
docs: Add JSON and JSON schema mode examples to README
2024-01-27 19:36:33 -05:00
Andrei Betlen
c1d0fff8a9
Bump version
2024-01-27 18:36:56 -05:00
Andrei
d8f6914f45
Add json schema mode ( #1122 )
...
* Add json schema mode
* Add llava chat format support
2024-01-27 16:52:18 -05:00
Andrei Betlen
c6d3bd62e8
Update llama.cpp
2024-01-27 16:22:46 -05:00
Andrei Betlen
35918873b4
Update llama.cpp
2024-01-26 11:45:48 -05:00
Andrei Betlen
f5cc6b3053
Bump version
2024-01-25 11:28:16 -05:00