Commit graph

1451 commits

Author SHA1 Message Date
Douglas Hanley
19b55ad3e5
feat: use gpu backend for clip if available (#1175) 2024-02-11 13:53:59 -05:00
Andrei Betlen
63b0c37836 Update llama.cpp 2024-02-09 13:36:58 -05:00
Andrei Betlen
4abb8c9386 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2024-02-09 13:32:31 -05:00
Andrei Betlen
e16f06e6eb fix: revert _create_completions. 2024-02-09 02:02:13 -05:00
Andrei Betlen
dfc1b17341 Update llama.cpp 2024-02-08 23:38:12 -05:00
Andrei Betlen
5b4ad6c80b Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-02-08 23:34:45 -05:00
Andrei Betlen
85d3374b4d fix: broken import 2024-02-08 01:13:28 -05:00
Andrei Betlen
b5fca911b5 feat: Move tokenizer to own module 2024-02-08 01:08:18 -05:00
Andrei Betlen
2ef7ba3aed misc: rename grammar test 2024-02-08 01:07:44 -05:00
Jeffrey Fong
901827013b
feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class (#1078)
* convert functionary-v1 chat handler to use hf autotokenizer

* add hf_tokenizer + inteegrate functionary-v1.4 prompt template

* integrate functionary v2 prompt template

* update readme

* set up parallel function calling wip

* set up parallel function calling

* Update README.md

* Update README.md

* refactor tokenizers

* include old functionary handler for backward compatibility

* add hf_tokenizer_path in server ModelSettings

* convert functionary-v1 chat handler to use hf autotokenizer

* add hf_tokenizer + inteegrate functionary-v1.4 prompt template

* integrate functionary v2 prompt template

* update readme

* set up parallel function calling wip

* resolve merge conflict

* Update README.md

* Update README.md

* refactor tokenizers

* include old functionary handler for backward compatibility

* add hf_tokenizer_path in server ModelSettings

* Cleanup PR, fix breaking changes

* Use hf_pretrained_model_name_or_path for tokenizer

* fix hf tokenizer in streaming

* update README

* refactor offset mapping

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-07 20:07:03 -05:00
Andrei Betlen
ce12775490 Update llama.cpp 2024-02-06 18:50:56 -05:00
Andrei Betlen
34f31040f6 Bump version 2024-02-06 12:47:59 -05:00
Andrei Betlen
5e3e67af47 Update llama.cpp 2024-02-06 12:44:07 -05:00
Andrei Betlen
310fbf4e49 Update llama.cpp 2024-02-05 22:07:14 -05:00
Andrei Betlen
59760c85ed fix: Use llama_log_callback to avoid suppress_stdout_stderr 2024-02-05 21:52:12 -05:00
Andrei Betlen
3553b14670 Update llama.cpp 2024-02-05 13:26:50 -05:00
Andrei
7467f129e5
Revert "Fix: fileno error google colab (#729) (#1156)" (#1157)
This reverts commit bebfba0f08.
2024-02-02 12:18:55 -05:00
Dulsara
bebfba0f08
Fix: fileno error google colab (#729) (#1156)
Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.
2024-02-02 12:05:46 -05:00
Andrei Betlen
8a5911bd5d Update llama.cpp 2024-02-02 09:41:27 -05:00
Andrei Betlen
de526d0214 Update llama.cpp 2024-02-01 12:35:31 -05:00
Andrei Betlen
3322eadbf3 Bump version 2024-01-31 15:10:18 -05:00
Andrei Betlen
a8cb34eacd Update llama.cpp 2024-01-31 15:05:51 -05:00
Andrei
fb762a6041
Add speculative decoding (#1120)
* Add draft model param to llama class, implement basic prompt lookup decoding draft model

* Use samplingcontext for sampling

* Use 1d array

* Use draft model for sampling

* Fix dumb mistake

* Allow for later extensions to the LlamaDraftModel api

* Cleanup

* Adaptive candidate prediction

* Update implementation to match hf transformers

* Tuning

* Fix bug where last token was not used for ngram prediction

* Remove heuristic for num_pred_tokens (no benefit)

* fix: n_candidates bug.

* Add draft_model_num_pred_tokens server setting

* Cleanup

* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
71e3e4c435 Update llama.cpp 2024-01-31 10:41:42 -05:00
Andrei Betlen
2b37d8e438 fix: Run server command. Closes #1143 2024-01-31 10:37:19 -05:00
Andrei Betlen
078cca0361 fix: Pass raise_exception and add_generation_prompt to jinja2 chat template 2024-01-31 08:42:21 -05:00
Andrei Betlen
411494706a Update llama.cpp 2024-01-31 08:35:21 -05:00
Andrei Betlen
bf9e824922 Bump version 2024-01-30 12:27:27 -05:00
Andrei Betlen
247a16de66 docs: Update README 2024-01-30 12:23:07 -05:00
Andrei Betlen
13b7ced7da Update llama.cpp 2024-01-30 12:21:41 -05:00
Andrei Betlen
011cd84ded Update llama.cpp 2024-01-30 09:48:09 -05:00
Andrei
da003d8768
Automatically set chat format from gguf (#1110)
* Use jinja formatter to load chat format from gguf

* Fix off-by-one error in metadata loader

* Implement chat format auto-detection
2024-01-29 14:22:23 -05:00
Andrei Betlen
059f6b3ac8 docs: fix typos 2024-01-29 11:02:25 -05:00
Andrei Betlen
843e77e3e2 docs: Add Vulkan build instructions 2024-01-29 11:01:26 -05:00
Andrei Betlen
464af5b39f Bump version 2024-01-29 10:46:04 -05:00
Andrei Betlen
9f7852acfa misc: Add vulkan target 2024-01-29 10:39:23 -05:00
Andrei Betlen
85f8c4c06e Update llama.cpp 2024-01-29 10:39:08 -05:00
Andrei Betlen
9ae5819ee4 Add chat format test. 2024-01-29 00:59:01 -05:00
Rafaelblsilva
ce38dbdf07
Add mistral instruct chat format as "mistral-instruct" (#799)
* Added mistral instruct chat format as "mistral"

* Fix stop sequence (merge issue)

* Update chat format name to `mistral-instruct`

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-29 00:34:42 -05:00
Andrei Betlen
52c4a84faf Bump version 2024-01-28 19:35:37 -05:00
Andrei Betlen
31e0288a41 Update llama.cpp 2024-01-28 19:34:27 -05:00
Andrei Betlen
ccf4908bfd Update llama.cpp 2024-01-28 12:55:32 -05:00
Andrei Betlen
8c59210062 docs: Fix typo 2024-01-27 19:37:59 -05:00
Andrei Betlen
399fa1e03b docs: Add JSON and JSON schema mode examples to README 2024-01-27 19:36:33 -05:00
Andrei Betlen
c1d0fff8a9 Bump version 2024-01-27 18:36:56 -05:00
Andrei
d8f6914f45
Add json schema mode (#1122)
* Add json schema mode

* Add llava chat format support
2024-01-27 16:52:18 -05:00
Andrei Betlen
c6d3bd62e8 Update llama.cpp 2024-01-27 16:22:46 -05:00
Andrei Betlen
35918873b4 Update llama.cpp 2024-01-26 11:45:48 -05:00
Andrei Betlen
f5cc6b3053 Bump version 2024-01-25 11:28:16 -05:00
Andrei Betlen
cde7514c3d feat(server): include llama-cpp-python version in openapi spec 2024-01-25 11:23:18 -05:00