Andrei Betlen
cb791716b4
fix: Always set logits_all = True when using speculative decoding
2024-02-12 16:19:05 -05:00
Andrei
153a0049d9
feat: Generic chatml Function Calling ( #957 )
...
* Add demo notebook
* Add initial chat handler
* Update OpenAI types
* Add generic chatml function calling (wip)
* Update chatml generic function calling.
* Progress on auto-tool calls
* fix streaming functions
* Remove print statements
* fix: Suppress output from llama.cpp init and grammar creation
* Add OpenAI v1 python api compatible chat completion function
* Support non-streaming multi-tool calls
* Format
* Include function_call in response.
2024-02-12 15:56:07 -05:00
Andrei Betlen
69413ce08e
Update llama.cpp
2024-02-11 19:00:17 -05:00
Connor
a05d90446f
fix: Circular dependancy preventing early Llama object free ( #1176 )
...
commit 901827013b
introduced a cyclic dependency
within Llama objects. That change causes old models to linger in memory longer
than necessary, thereby creating memory bloat in most applications attempting
to switch between models at runtime. This patch simply removes the problematic
line, allowing models to deallocate without relying on GC. One might also
consider combining `weakref.ref` with a `@property` if the `llama` attribute is
absolutely necessary to expose in the tokenizer class.
2024-02-11 13:57:57 -05:00
Andrei Betlen
4abb8c9386
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2024-02-09 13:32:31 -05:00
Andrei Betlen
e16f06e6eb
fix: revert _create_completions.
2024-02-09 02:02:13 -05:00
Andrei Betlen
85d3374b4d
fix: broken import
2024-02-08 01:13:28 -05:00
Andrei Betlen
b5fca911b5
feat: Move tokenizer to own module
2024-02-08 01:08:18 -05:00
Jeffrey Fong
901827013b
feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class ( #1078 )
...
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* set up parallel function calling
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* resolve merge conflict
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* Cleanup PR, fix breaking changes
* Use hf_pretrained_model_name_or_path for tokenizer
* fix hf tokenizer in streaming
* update README
* refactor offset mapping
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-07 20:07:03 -05:00
Andrei Betlen
34f31040f6
Bump version
2024-02-06 12:47:59 -05:00
Andrei Betlen
59760c85ed
fix: Use llama_log_callback to avoid suppress_stdout_stderr
2024-02-05 21:52:12 -05:00
Andrei Betlen
3553b14670
Update llama.cpp
2024-02-05 13:26:50 -05:00
Andrei
7467f129e5
Revert "Fix: fileno error google colab ( #729 ) ( #1156 )" ( #1157 )
...
This reverts commit bebfba0f08
.
2024-02-02 12:18:55 -05:00
Dulsara
bebfba0f08
Fix: fileno error google colab ( #729 ) ( #1156 )
...
Instead of using a devnull just made a dummy class with a 'write()' method that does nothing.
2024-02-02 12:05:46 -05:00
Andrei Betlen
3322eadbf3
Bump version
2024-01-31 15:10:18 -05:00
Andrei
fb762a6041
Add speculative decoding ( #1120 )
...
* Add draft model param to llama class, implement basic prompt lookup decoding draft model
* Use samplingcontext for sampling
* Use 1d array
* Use draft model for sampling
* Fix dumb mistake
* Allow for later extensions to the LlamaDraftModel api
* Cleanup
* Adaptive candidate prediction
* Update implementation to match hf transformers
* Tuning
* Fix bug where last token was not used for ngram prediction
* Remove heuristic for num_pred_tokens (no benefit)
* fix: n_candidates bug.
* Add draft_model_num_pred_tokens server setting
* Cleanup
* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
71e3e4c435
Update llama.cpp
2024-01-31 10:41:42 -05:00
Andrei Betlen
078cca0361
fix: Pass raise_exception and add_generation_prompt to jinja2 chat template
2024-01-31 08:42:21 -05:00
Andrei Betlen
bf9e824922
Bump version
2024-01-30 12:27:27 -05:00
Andrei Betlen
011cd84ded
Update llama.cpp
2024-01-30 09:48:09 -05:00
Andrei
da003d8768
Automatically set chat format from gguf ( #1110 )
...
* Use jinja formatter to load chat format from gguf
* Fix off-by-one error in metadata loader
* Implement chat format auto-detection
2024-01-29 14:22:23 -05:00
Andrei Betlen
464af5b39f
Bump version
2024-01-29 10:46:04 -05:00
Andrei Betlen
9ae5819ee4
Add chat format test.
2024-01-29 00:59:01 -05:00
Rafaelblsilva
ce38dbdf07
Add mistral instruct chat format as "mistral-instruct" ( #799 )
...
* Added mistral instruct chat format as "mistral"
* Fix stop sequence (merge issue)
* Update chat format name to `mistral-instruct`
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-29 00:34:42 -05:00
Andrei Betlen
52c4a84faf
Bump version
2024-01-28 19:35:37 -05:00
Andrei Betlen
c1d0fff8a9
Bump version
2024-01-27 18:36:56 -05:00
Andrei
d8f6914f45
Add json schema mode ( #1122 )
...
* Add json schema mode
* Add llava chat format support
2024-01-27 16:52:18 -05:00
Andrei Betlen
35918873b4
Update llama.cpp
2024-01-26 11:45:48 -05:00
Andrei Betlen
f5cc6b3053
Bump version
2024-01-25 11:28:16 -05:00
Andrei Betlen
cde7514c3d
feat(server): include llama-cpp-python version in openapi spec
2024-01-25 11:23:18 -05:00
Andrei Betlen
c970d41a85
fix: llama_log_set should be able to accept null pointer
2024-01-24 10:38:30 -05:00
Andrei Betlen
9677a1f2c8
fix: Check order
2024-01-23 22:28:03 -05:00
Andrei Betlen
4d6b2f7b91
fix: format
2024-01-23 22:08:27 -05:00
Phil H
fe5d6ea648
fix: GGUF metadata KV overrides, re #1011 ( #1116 )
...
* kv overrides another attempt
* add sentinel element, simplify array population
* ensure sentinel element is zeroed
2024-01-23 22:00:38 -05:00
Andrei Betlen
fcdf337d84
Update llama.cpp
2024-01-22 11:25:11 -05:00
Andrei Betlen
5b982d0f8c
fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format.
2024-01-22 08:32:48 -05:00
Andrei Betlen
2ce0b8aa2c
Bump version
2024-01-21 20:30:24 -05:00
Andrei Betlen
d3f5528ca8
fix: from_json_schema oneof/anyof bug. Closes #1097
2024-01-21 19:06:53 -05:00
Andrei Betlen
24f39454e9
fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats.
2024-01-21 18:38:04 -05:00
Andrei Betlen
7f3209b1eb
feat: Add add_generation_prompt option for jinja2chatformatter.
2024-01-21 18:37:24 -05:00
Andrei Betlen
be09318c26
feat: Add Jinja2ChatFormatter
2024-01-19 15:04:42 -05:00
Andrei Betlen
5a34c57e54
feat: Expose gguf model metadata in metadata property
2024-01-19 10:46:03 -05:00
Andrei Betlen
833a7f1a86
Bump version
2024-01-19 09:03:35 -05:00
Andrei Betlen
3babe3512c
Fix mirostat sampling
2024-01-19 08:31:59 -05:00
Andrei Betlen
141293a75b
Fix python3.8 support
2024-01-19 08:17:49 -05:00
Andrei Betlen
656f3d8968
Bump version
2024-01-18 21:30:36 -05:00
Andrei Betlen
89cce50f8c
Update llama.cpp
2024-01-18 21:21:49 -05:00
Andrei Betlen
b8fc1c7d83
feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files.
2024-01-18 21:21:37 -05:00
Andrei Betlen
48c3b77e6f
Offload KQV by default
2024-01-18 11:08:57 -05:00
Austin
6bfe98bd80
Integration of Jinja2 Templating ( #875 )
...
* feat: Add support for jinja templating
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* fix: Refactor chat formatter and update interface for jinja templates
- Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality.
- Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility.
- Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string.
- Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation.
These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage.
* Add outline for Jinja2 templating integration documentation
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Add jinja2 as a dependency with version range for Hugging Face transformers compatibility
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Update jinja2 version constraint for mkdocs-material compatibility
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
* Fix attribute name in AutoChatFormatter
- Changed attribute name from `self._renderer` to `self._environment`
---------
Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
2024-01-17 09:47:52 -05:00
Andrei Betlen
7b46bb5a78
Re-order classes in llama.py
2024-01-17 09:16:13 -05:00
Andrei Betlen
cc4630e66f
Move helper classes to _internals submodule
2024-01-17 09:14:00 -05:00
Andrei Betlen
3b92419132
Move cache classes to llama_cache submodule.
2024-01-17 09:09:12 -05:00
Kyle Mistele
9c36688b33
fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 ( #1093 )
2024-01-16 18:54:06 -05:00
anil
cfb7da98ed
Support Accept text/event-stream in chat and completion endpoints, resolves #1083 ( #1088 )
...
Co-authored-by: Anil Pathak <anil@heyday.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-01-16 12:52:52 -05:00
Andrei Betlen
4b11fa83c0
Bump version
2024-01-15 12:54:51 -05:00
Andrei Betlen
84615adbc6
Add split_mode option. Closes #1085
2024-01-15 12:49:20 -05:00
Phil H
76aafa6149
Implement GGUF metadata KV overrides ( #1011 )
...
* Implement GGUF metadata overrides
* whitespace fix
* Fix kv overrides.
* Fix pointer and pickle
* Match llama.cpp kv_overrides cli argument
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-15 12:29:29 -05:00
yieldthought
7eff42c239
Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor ( #1012 )
...
The existing code often causes "LookupError: unknown encoding: ascii" when open() called in a destructor. Saving open in self.open is not enough to avoid this. Instead, we can avoid reopening /dev/null every time by doing it once when the module is loaded.
2024-01-15 10:52:10 -05:00
Mark Neumann
c689ccc728
Fix Pydantic model parsing ( #1087 )
2024-01-15 10:45:57 -05:00
Andrei Betlen
5502ac8876
Update llama.cpp
2024-01-15 10:12:10 -05:00
Andrei Betlen
359ae73643
Update llama.cpp
2024-01-14 08:17:22 -05:00
Andrei Betlen
7c898d5684
Update llama.cpp
2024-01-13 22:37:49 -05:00
Andrei Betlen
bb610b9428
Update llama.cpp
2024-01-11 22:51:12 -05:00
Andrei Betlen
f0159663d9
Bump version
2024-01-10 02:51:17 -05:00
Stephen Hankinson
df3be58d6c
Add ability to pass in penalize_nl param ( #1068 )
2024-01-10 02:46:27 -05:00
Joseph Turian
2ddce7294e
print_grammar to stderr ( #1052 )
2024-01-10 02:46:03 -05:00
Andrei Betlen
1ae05c102b
Update llama.cpp
2024-01-08 14:51:29 -05:00
Andrei Betlen
75d0527fd7
Bump version
2024-01-04 18:30:12 -05:00
Fedor Moiseev
907b9e9d42
Add Saiga chat format. ( #1050 )
2024-01-04 18:12:58 -05:00
xaviviro
cf743ec5d3
Added ChatGLM chat format ( #1059 )
...
Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>
2024-01-04 18:12:02 -05:00
Andrei Betlen
eb9c7d4ed8
Update llama.cpp
2024-01-03 22:04:04 -05:00
Andrei Betlen
011c3630f5
Bump version
2023-12-27 17:35:02 -05:00
Andrei Betlen
92284f32cb
Add HIP_PATH to dll search directories for windows users.
2023-12-22 15:29:56 -05:00
Andrei Betlen
2b0d3f36fa
set llama_max_devices using library function
2023-12-22 15:19:28 -05:00
Andrei Betlen
d9a1d90fd7
Fix typo
2023-12-22 15:12:27 -05:00
Andrei Betlen
37556bf9c4
Bump version
2023-12-22 14:55:58 -05:00
Andrei Betlen
6d8bc090f9
fix: inccorect bindings for kv override. Based on #1011
2023-12-22 14:52:20 -05:00
Andrei Betlen
522aecb868
docs: add server config docs
2023-12-22 14:37:24 -05:00
Andrei Betlen
6473796343
Update llama.cpp
2023-12-22 14:10:34 -05:00
swg
4b01a873ef
server: Support none defaulting to infinity for completions ( #111 )
...
* Support defaulting to infinity or -1 for chat completions
* Check if completion_tokens is none in error handler.
* fix: max_tokens in create completion should match openai spec
* Fix __call__
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 14:05:13 -05:00
Dave
12b7f2f4e9
[Feat] Multi model support ( #931 )
...
* Update Llama class to handle chat_format & caching
* Add settings.py
* Add util.py & update __main__.py
* multimodel
* update settings.py
* cleanup
* delete util.py
* Fix /v1/models endpoint
* MultiLlama now iterable, app check-alive on "/"
* instant model init if file is given
* backward compability
* revert model param mandatory
* fix error
* handle individual model config json
* refactor
* revert chathandler/clip_model changes
* handle chat_handler in MulitLlama()
* split settings into server/llama
* reduce global vars
* Update LlamaProxy to handle config files
* Add free method to LlamaProxy
* update arg parsers & install server alias
* refactor cache settings
* change server executable name
* better var name
* whitespace
* Revert "whitespace"
This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8.
* remove exe_name
* Fix merge bugs
* Fix type annotations
* Fix type annotations
* Fix uvicorn app factory
* Fix settings
* Refactor server
* Remove formatting fix
* Format
* Use default model if not found in model settings
* Fix
* Cleanup
* Fix
* Fix
* Remove unnused CommandLineSettings
* Cleanup
* Support default name for copilot-codex models
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 05:51:25 -05:00
Andrei Betlen
4a85442c35
Update llama.cpp
2023-12-22 00:12:37 -05:00
twaka
2f03fb0231
fix text_offset of multi-token characters ( #1037 )
...
* fix text_offsets for bytes tokens
* fix
2023-12-22 00:03:29 -05:00
docmeth02
33cc623346
Implement openai api compatible authentication ( #1010 )
2023-12-21 13:44:49 -05:00
Andrei Betlen
a05b4da80a
fix: float32 is not JSON serializable when streaming logits.
2023-12-18 18:40:36 -05:00
Andrei Betlen
7df6c32544
Fix type annotations
2023-12-18 18:14:53 -05:00
Andrei Betlen
b703aad79e
Fix type annotation
2023-12-18 18:13:37 -05:00
Andrei Betlen
d0aedfcff6
Fix type annotation
2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu
2993936b10
Fix ctypes definitions of llama_kv_cache_view_update
and llama_kv_cache_view_free
. ( #1028 )
2023-12-18 18:11:26 -05:00
Andrei Betlen
5e863d8a3b
Bump version
2023-12-18 16:09:18 -05:00
Andrei Betlen
095c650006
Add offload_kqv option to llama and server
2023-12-18 15:36:09 -05:00
Andrei Betlen
472b344ae3
Remove unnused import
2023-12-18 15:32:40 -05:00
kddubey
6b2e0e05b4
perf: Don't convert logprobs arrays to lists ( #1021 )
2023-12-18 14:28:12 -05:00
Brandon Roberts
62944df142
Bugfix: Remove f16_kv, add offload_kqv field ( #1019 )
...
F16_KV appears to have been removed here: af99c6fbfc
This addresses two issues:
- #995 which just requests to add the KV cache offloading param
- #1006 a NULL ptr exception when using the embeddings (introduced by
leaving f16_kv in the fields struct)
2023-12-18 14:27:11 -05:00
Daniele Morotti
f1c631dc53
Bug fixed with n_ctx=0 ( #1015 )
...
If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.
2023-12-16 18:59:50 -05:00
kddubey
5a8944672f
Fix logits_to_logprobs for 2-D and 3-D logits ( #1002 )
...
* Fix logits_to_logprobs for 2-D and 3-D logits
* Set dtype to single
* Test size
2023-12-16 18:59:26 -05:00
Andrei Betlen
534b1ea9b5
Update llama.cpp
2023-12-16 18:57:43 -05:00
Andrei Betlen
cbce061ffd
Bump version
2023-12-13 21:52:29 -05:00
yhfgyyf
8b4db732bd
Add qwen chat format ( #1005 )
2023-12-13 21:43:43 -05:00
Andrei Betlen
690c563b60
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-12-13 21:43:19 -05:00
Andrei Betlen
c0fc0a1e82
Update llama.cpp
2023-12-13 21:43:16 -05:00
Radoslav Gerganov
8e44a32075
Add support for running the server with SSL ( #994 )
2023-12-11 20:47:11 -05:00
Tanner Hobson
ef22e478db
Replace logits_to_logprobs implementation with numpy equivalent to llama.cpp ( #991 )
...
See #990 . This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.
2023-12-11 20:46:27 -05:00
zocainViken
ac35f68e4d
Fix UnsupportedOperation: fileno in suppress_stdout_stderr ( #961 )
...
* bug fixing
* llava from readme got this error: UnsupportedOperation: fileno quick fix by checking hasattr
* multi modal params fix: add logits = True -> to make llava work
* multi modal params fix: add logits = True -> to make llava work
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2023-12-11 20:44:51 -05:00
chiensen
b938cccf05
Add Pygmalion chat format ( #986 )
2023-12-11 20:44:04 -05:00
Andrei Betlen
c1e73e73a3
Bump version
2023-12-11 10:26:42 -05:00
Andrei Betlen
ec26f364cc
Remove f16_kv
2023-12-11 10:25:37 -05:00
Andrei Betlen
f1edc66b21
Update llama.cpp
2023-12-11 10:21:35 -05:00
kddubey
b069d06346
Fix #891 ( #952 )
2023-11-29 05:39:52 -05:00
Andrei Betlen
ad963a0961
Bump version
2023-11-28 04:58:20 -05:00
Andrei Betlen
e3941d9c67
Make building llava optional
2023-11-28 04:55:21 -05:00
Andrei Betlen
7f3704b896
Bump version
2023-11-27 19:14:25 -05:00
Andrei Betlen
396dbf0b2b
docs: Improve low-level docstrings
2023-11-27 19:03:02 -05:00
Andrei Betlen
a928893d03
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-11-26 15:57:13 -05:00
Andrei Betlen
6308f21d5e
docs: Update Llama docs
2023-11-26 15:56:40 -05:00
Gardner Bickford
c2d63a7148
fix: Typo in the Open Orca chat format #874 ( #947 )
2023-11-26 15:39:18 -05:00
Andrei Betlen
f03a38e62a
Update llama.cpp
2023-11-26 15:38:22 -05:00
Andrei Betlen
1a7bf2037b
docs: Update openapi endpoint names
2023-11-24 03:39:29 -05:00
Andrei Betlen
4026166e68
docs: Update completion and chat_completion parameter docstrings
2023-11-24 03:24:19 -05:00
Andrei Betlen
8c3aa7858b
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-11-24 00:15:36 -05:00
Andrei Betlen
de2e2bc083
misc fix verbose printing in functionary model
2023-11-23 20:14:23 -05:00
Andrei Betlen
36048d46af
Update llama.cpp
2023-11-23 16:26:00 -05:00
mrfakename
d68fc07b1b
Add Zephyr format ( #937 )
2023-11-23 01:20:08 -05:00
caiyesd
4184835078
Add chat format to support baichuan ( #938 )
...
Signed-off-by: caiyesd <caiyesd@gmail.com>
2023-11-23 01:19:50 -05:00
Andrei Betlen
c647f01609
Add from_json_schema to LlamaGrammar
2023-11-23 00:27:00 -05:00
Andrei Betlen
be1f64d569
docs: Add docstrings from llama.cpp
2023-11-23 00:26:26 -05:00
Andrei Betlen
b6bb7ac76a
docs: Add Llama class example
2023-11-22 23:10:04 -05:00
caiyesd
b8f29f4bf0
Add baichuan-2 chat format ( #936 )
...
Signed-off-by: caiyesd <caiyesd@gmail.com>
2023-11-22 06:08:06 -05:00
Andrei Betlen
8b6ca22846
Fix type warnings for json schema grammar converter
2023-11-21 13:32:00 -05:00
Andrei Betlen
230fc8b535
Bump version
2023-11-21 05:04:55 -05:00
Andrei Betlen
128dc4731f
Fix #569
2023-11-21 04:39:05 -05:00
Andrei Betlen
7a3f87846b
Format
2023-11-21 04:02:20 -05:00
Andrei Betlen
422ebc89ce
Fix: Add logit_bias to all completion api methods
2023-11-21 04:01:36 -05:00
Andrei Betlen
07e47f55ba
Add support for logit_bias outside of server api. Closes #827
2023-11-21 03:59:46 -05:00
Maarten ter Huurne
c21edb6908
Do not set grammar
to None
for new LlamaGrammar
objects ( #834 )
...
* Do not set `grammar` to `None` for new `LlamaGrammar` objects
The `grammar` attribute is written by `init()`, but that method always
returns `None`, so `__init__()` would then discard the previously
written object.
* Add minimal test for grammar parsing
2023-11-21 00:23:18 -05:00
mrfakename
ef65fc5ff4
Add MistralLite, Intel, and OpenChat prompt formats ( #927 )
...
* Add MistralLite format
* Update llama_chat_format.py
* Update llama_chat_format.py
2023-11-21 00:19:25 -05:00
TK-Master
b8438f70b5
Added support for min_p ( #921 )
...
* Added support for min_p
My small contribution to this great project.
Ref: https://github.com/ggerganov/llama.cpp/pull/3841
Closes: https://github.com/abetlen/llama-cpp-python/issues/911
* Fix for negative temp (sample_softmax)
2023-11-20 23:21:33 -05:00
Andrei Betlen
a34d480141
Fix #929
2023-11-20 22:50:59 -05:00
Andrei Betlen
2c2afa320f
Update llama.cpp
2023-11-20 14:11:33 -05:00
Andrei Betlen
f2901d840e
Bump version
2023-11-14 14:10:00 -05:00
Andrei Betlen
01846a76b9
Bump version
2023-11-10 16:36:12 -05:00
Andrei Betlen
b7e60b66f4
Bump version
2023-11-10 06:21:24 -05:00
Andrei Betlen
6f0b0b1b84
Fix sampling bug when logits_all=False
2023-11-10 05:15:41 -05:00
Andrei Betlen
d9b38e3e3a
Potential bugfix for eval
2023-11-10 04:41:19 -05:00
Andrei Betlen
b84d76a844
Fix: add default stop sequence to chatml chat format
2023-11-10 04:24:48 -05:00
Andrei Betlen
1b376c62b7
Update functionary for new OpenAI API
2023-11-10 02:51:58 -05:00
Andrei Betlen
17da8fb446
Add missing tool_calls finish_reason
2023-11-10 02:51:06 -05:00
Andrei Betlen
770df34436
Add $ref and $defs support to json schema converter
2023-11-10 02:50:46 -05:00
Andrei Betlen
faeae181b1
Fix: json_schema_to_gbnf should take string dump of json schema as input
2023-11-10 02:50:17 -05:00
Andrei Betlen
e7962d2c73
Fix: default max_tokens matches openai api (16 for completion, max length for chat completion)
2023-11-10 02:49:27 -05:00
Andrei Betlen
b62c449839
Bugfix: missing response_format for functionary and llava chat handlers
2023-11-09 00:55:23 -05:00
Andrei Betlen
fd41ed3a90
Add set_seed to Llama class
2023-11-08 11:09:41 -05:00
Andrei Betlen
ca4cb88351
Fix destructor NoneType is not callable error
2023-11-08 11:05:45 -05:00
Andrei Betlen
01cb3a0381
Bump version
2023-11-08 00:54:54 -05:00
Andrei Betlen
b30b9c338b
Add JSON mode support. Closes #881
2023-11-08 00:07:16 -05:00
Andrei Betlen
4852a6a39c
Fix built in GBNF grammar rules
2023-11-08 00:06:22 -05:00
Andrei Betlen
64f5153c35
Add seed parameter to chat handlers
2023-11-07 23:41:29 -05:00
Andrei Betlen
86aeb9f3a1
Add seed parameter support for completion and chat_completion requests. Closes #884
2023-11-07 23:37:28 -05:00
Damian Stewart
aab74f0b2b
Multimodal Support (Llava 1.5) ( #821 )
...
* llava v1.5 integration
* Point llama.cpp to fork
* Add llava shared library target
* Fix type
* Update llama.cpp
* Add llava api
* Revert changes to llama and llama_cpp
* Update llava example
* Add types for new gpt-4-vision-preview api
* Fix typo
* Update llama.cpp
* Update llama_types to match OpenAI v1 API
* Update ChatCompletionFunction type
* Reorder request parameters
* More API type fixes
* Even More Type Updates
* Add parameter for custom chat_handler to Llama class
* Fix circular import
* Convert to absolute imports
* Fix
* Fix pydantic Jsontype bug
* Accept list of prompt tokens in create_completion
* Add llava1.5 chat handler
* Add Multimodal notebook
* Clean up examples
* Add server docs
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-11-07 22:48:51 -05:00
Andrei Betlen
56171cf7bf
Bump version
2023-11-06 09:37:55 -05:00
Andrei Betlen
be0add1b2d
Fix type bug
2023-11-06 09:30:38 -05:00
Andrei Betlen
e214a58422
Refactor Llama class internals
2023-11-06 09:16:36 -05:00
Andrei Betlen
bbffdaebaa
Refactor autotokenizer format to reusable function
2023-11-06 09:07:27 -05:00
Joe
4ff8def4d0
#717 : Add support for Huggingface Autotokenizer ( #790 )
...
Co-authored-by: Andrei <abetlen@gmail.com>
2023-11-05 18:06:36 -05:00
earonesty
3580e2c5df
Update llama_chat_format.py ( #869 )
...
* Update llama_chat_format.py
properly formal llama2 with first-message prompt embedded
* Update llama_chat_format.py
2023-11-05 17:00:13 -05:00
Andrei Betlen
f0b30ef7dc
Update llama.cpp
2023-11-05 16:57:10 -05:00
Andrei Betlen
2ec043af76
Clean up stdout / stderr suppression
2023-11-03 13:02:15 -04:00
Andrei Betlen
4ea7027c41
Rename internal only module utils to _utils
2023-11-03 12:55:55 -04:00
Andrei Betlen
df9362eeea
Update llama.cpp
2023-11-03 11:34:50 -04:00
Andrei
3af7b21ff1
Add functionary support ( #784 )
...
* Add common grammars and json-schema-to-grammar utility function from llama.cpp
* Pass functions to format function
* Add basic functionary formatting
* Add LlamaChatHandler for more complex chat use cases
* Add function calling example notebook
* Add support for regular chat completions alongside function calling
2023-11-03 02:12:14 -04:00
Andrei
ab028cb878
Migrate inference to llama_batch and llama_decode api ( #795 )
...
* Add low-level batching notebook
* fix: tokenization of special characters: (#850 )
It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly
* Update CHANGELOG
* Cleanup
* Fix runner label
* Update notebook
* Use llama_decode and batch api
* Support logits_all parameter
---------
Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>
2023-11-02 20:13:57 -04:00
Andrei Betlen
8350de9a18
Bump version
2023-11-02 15:53:01 -04:00
Andrei Betlen
011b95d7f3
Fix name 'open' is not defined exception. Closes #860
2023-11-02 15:30:55 -04:00
Andrei Betlen
fa83cc5f9c
Update llama.cpp
...
Fix build examples
Exclude examples directory
Revert cmake changes
Try actions/checkout@v4
Try to update submodules
Revert
Update llama.cpp
Fix build examples
Exclude examples directory
Revert cmake changes
Try actions/checkout@v4
Try to update submodules
Revert
2023-11-02 14:28:15 -04:00
Antoine Lizee
4d4e0f11e2
fix: tokenization of special characters: ( #850 )
...
It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly
2023-11-02 14:28:14 -04:00
Andrei Betlen
6b3aa7fc8f
Bump version
2023-11-01 19:25:03 -04:00
Sujeendran Menon
7b136bb5b1
Fix for shared library not found and compile issues in Windows ( #848 )
...
* fix windows library dll name issue
* Updated README.md Windows instructions
* Update llama_cpp.py to handle different windows dll file versions
2023-11-01 18:55:57 -04:00
cebtenzzre
eefd76fe81
llama: fix exception in Llama.__del__ ( #846 )
2023-11-01 18:53:57 -04:00
David Ponce
3fc9147218
Iterate over tokens that should be biased rather than the entire vocabulary. ( #851 )
2023-11-01 18:53:47 -04:00
Marko Tasic
9c8f4dca5f
fixed Llama._create_completion suffix check, it can be either None or str instance ( #854 )
2023-11-01 18:52:50 -04:00
Daniel Thuerck
5f8f369d1b
Pass-Through grammar parameter in web server. ( #855 ) Closes #778
2023-11-01 18:51:12 -04:00
Adam Katora
25cb710281
Update llama_types.py ( #849 )
...
Minor typo fix, funcion -> function
2023-11-01 18:50:11 -04:00
Andrei Betlen
d808fd436c
Update llama.cpp
2023-10-31 21:29:35 -04:00
Andrei Betlen
53861c9e53
Update llama.cpp
2023-10-24 03:13:32 -04:00
gmcgoldr
09a8406c83
Fix streaming doesn't return finish reason ( #798 )
...
When streaming the yield that contains the finish can be skipped. This change ensures that yield isn't skipped.
2023-10-19 02:55:56 -04:00
Andrei Betlen
28c2b884e2
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-10-19 02:55:31 -04:00
Andrei Betlen
ff580031d2
Update llama.cpp
2023-10-19 02:55:08 -04:00
Xiaoyu Kevin Hu
a315128d66
update value check for n_gpu_layers field ( #826 )
2023-10-18 18:25:25 -04:00
Pierre Alexandre SCHEMBRI
10304d75fc
Make use of suppress_stdout_stderr when freeing model ( #803 )
2023-10-15 13:52:43 -04:00
Ma, Guokai
a1ac199980
Fix repeat greeting ( #808 )
...
* fix repeated greeting
* remove seperator between role and message
2023-10-15 13:52:21 -04:00
Eric Liu
b50166500e
Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES ( #820 )
...
* Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES
* reword
2023-10-15 13:51:51 -04:00
Andrei Betlen
d6a130a052
Print traceback on server error
2023-10-10 15:56:04 -04:00
Andrei Betlen
43dfe1e2ab
Update llama.cpp
2023-10-05 16:07:49 -04:00
Andrei Betlen
a7d17b8ac9
Update llama.cpp
2023-10-03 15:23:35 -04:00
Andrei Betlen
305482bd41
Add chatml chat format
2023-09-30 21:01:34 -04:00
Andrei Betlen
5ef5280ef9
Log server exceptions to stdout
2023-09-30 19:13:36 -04:00
Andrei Betlen
fab4bccc35
Bump version
2023-09-30 16:04:46 -04:00
Andrei Betlen
d696251fbe
Fix logits_all bug
2023-09-30 16:02:35 -04:00
Andrei Betlen
6ee413d79e
Bump version
2023-09-30 13:23:09 -04:00
Andrei Betlen
42bb721d64
Fix bug in embedding
2023-09-30 13:20:22 -04:00
Andrei Betlen
5d62d55a82
Bump version
2023-09-30 00:07:06 -04:00
Andrei Betlen
386c88b68e
Bump version
2023-09-29 20:07:31 -04:00
Andrei Betlen
d9bce17794
Update server params
2023-09-29 19:59:12 -04:00
Andrei Betlen
3720c739d4
Update llama.cpp
2023-09-29 19:58:21 -04:00
Andrei
3bca7708fb
Configurable Chat Formats ( #711 )
...
* Add configurable default chat completion format.
* Remove chat_template file to avoid circular import
* Update llama_types
* Add chat format
2023-09-29 19:52:04 -04:00
Josh XT
a945404b4a
Fix rope scaling defaults ( #767 )
...
* Fix rope scale with backwards compatibility
* Fix defaults
* Fix op
* Remove backwards compatibility
* Check single val
2023-09-29 16:03:57 -04:00
Andrei Betlen
1a1c3dc418
Update llama.cpp
2023-09-28 22:42:03 -04:00
Andrei Betlen
4177ae6d34
Bump version
2023-09-25 14:38:38 -04:00
Viacheslav/Slava Tradunsky
3d5e5b1c04
Adds openai-processing-ms response header ( #748 )
2023-09-25 13:55:58 -04:00
Andrei Betlen
dbca136fea
Update llama_types and names to match openai api
2023-09-20 15:38:26 -04:00
Andrei Betlen
38e34c97f0
Update llama.cpp
2023-09-18 16:11:27 -04:00
Andrei Betlen
8d75016549
Install required runtime dlls to package directory on windows
2023-09-16 14:57:49 -04:00
Andrei Betlen
acf18fcdf0
Bump version
2023-09-15 14:22:21 -04:00
Andrei Betlen
b047b3034e
Remove confusing helpstring from server cli args. Closes #719
2023-09-15 14:09:43 -04:00
Andrei Betlen
24fec0b242
Bump version
2023-09-14 18:33:08 -04:00
Andrei Betlen
8474665625
Update base_path to fix issue resolving dll in windows isolation container.
2023-09-14 14:51:43 -04:00
Andrei Betlen
507bcc7171
Bump version
2023-09-13 23:15:23 -04:00
Andrei Betlen
0449d29b9f
Fix boolean env vars and cli arguments
2023-09-13 23:09:57 -04:00
earonesty
58a6e42cc0
Update app.py ( #705 )
2023-09-13 23:01:34 -04:00
Andrei Betlen
f4090a0bb2
Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs.
2023-09-13 23:00:43 -04:00
Andrei Betlen
c999325e8e
Fix boolean cli flags
2023-09-13 22:56:10 -04:00
Andrei Betlen
4daf77e546
Format
2023-09-13 21:23:23 -04:00
Andrei Betlen
2920c4bf7e
Update server params. Added lora_base, lora_path, low_vram, and main_gpu. Removed rms_norm_eps and n_gqa (deprecated in llama.cpp)
2023-09-13 21:23:13 -04:00
Andrei Betlen
6a20293fc2
Reorder init params to match llama.cpp order
2023-09-13 21:20:26 -04:00
Andrei Betlen
c8f9b8a734
Explicitly make all init params other than model_path into keyword only params
2023-09-13 21:19:47 -04:00
Andrei Betlen
a68f9e2791
Add kwargs to init to catch extra params
2023-09-13 21:19:02 -04:00
Andrei Betlen
9e345a47a2
remove print
2023-09-13 21:12:27 -04:00
Andrei Betlen
517f9ed80b
Convert missed llama.cpp constants into standard python types
2023-09-13 21:11:52 -04:00
Andrei Betlen
c4c440ba2d
Fix tensor_split cli option
2023-09-13 20:00:42 -04:00
Andrei Betlen
203ede4ba2
Bump version
2023-09-13 18:07:08 -04:00
Andrei Betlen
759405c84b
Fix issue with Literal and Optional cli arguments not working. Closes #702
2023-09-13 18:06:12 -04:00
Devrim
da9df78db0
Add X-Request-ID request header for mirroring custom IDs. ( #703 )
2023-09-13 16:18:31 -04:00
Andrei Betlen
8e13520796
Bump version
2023-09-13 01:47:58 -04:00
Andrei Betlen
2787663a25
Bump version
2023-09-12 21:00:01 -04:00
Andrei Betlen
6e89775759
Bump version
2023-09-12 18:57:01 -04:00
Andrei Betlen
bb4e67e7aa
Using dynamic version
2023-09-12 18:56:36 -04:00
Andrei Betlen
1910793f56
Merge branch 'main' into v0.2-wip
2023-09-12 16:43:32 -04:00
Andrei Betlen
c7901f1141
Bump version
2023-09-12 16:16:40 -04:00
janvdp
33ce931cce
merge upstream
2023-09-09 21:21:04 +02:00
Andrei Betlen
d3f63211ef
Update llama.cpp
2023-09-09 12:12:32 -04:00
janvdp
da0fdafc32
import version in __init__.py
2023-09-05 21:09:28 +02:00
janvdp
6e8e64d09a
add version file
2023-09-05 21:09:08 +02:00
Andrei Betlen
186626d58e
Update llama.cpp
2023-09-01 14:26:13 -04:00
Andrei Betlen
47de3ab104
Update llama.cpp
2023-08-29 07:36:20 -04:00
Andrei Betlen
3f76e1de52
cjk pr minor cleanup
2023-08-29 07:21:59 -04:00
Andrei
bae44ec8bf
Merge pull request #309 from MeouSker77/fix-CJK
...
Fix CJK and emoji stream output
2023-08-29 06:58:10 -04:00
Andrei Betlen
e0dcbc28a1
Update llama.cpp
2023-08-28 10:33:45 -04:00
Andrei Betlen
4887973c22
Update llama.cpp
2023-08-27 12:59:20 -04:00
Andrei Betlen
3a29d65f45
Update llama.cpp
2023-08-26 23:36:24 -04:00