Commit graph

630 commits

Author SHA1 Message Date
Andrei Betlen
141293a75b Fix python3.8 support 2024-01-19 08:17:49 -05:00
Andrei Betlen
656f3d8968 Bump version 2024-01-18 21:30:36 -05:00
Andrei Betlen
89cce50f8c Update llama.cpp 2024-01-18 21:21:49 -05:00
Andrei Betlen
b8fc1c7d83 feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files. 2024-01-18 21:21:37 -05:00
Andrei Betlen
48c3b77e6f Offload KQV by default 2024-01-18 11:08:57 -05:00
Austin
6bfe98bd80
Integration of Jinja2 Templating (#875)
* feat: Add support for jinja templating

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* fix: Refactor chat formatter and update interface for jinja templates

- Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality.
- Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility.
- Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string.
- Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation.

These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage.

* Add outline for Jinja2 templating integration documentation

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Add jinja2 as a dependency with version range for Hugging Face transformers compatibility

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Update jinja2 version constraint for mkdocs-material compatibility

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Fix attribute name in AutoChatFormatter

- Changed attribute name from `self._renderer` to `self._environment`

---------

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
2024-01-17 09:47:52 -05:00
Andrei Betlen
7b46bb5a78 Re-order classes in llama.py 2024-01-17 09:16:13 -05:00
Andrei Betlen
cc4630e66f Move helper classes to _internals submodule 2024-01-17 09:14:00 -05:00
Andrei Betlen
3b92419132 Move cache classes to llama_cache submodule. 2024-01-17 09:09:12 -05:00
Kyle Mistele
9c36688b33
fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 (#1093) 2024-01-16 18:54:06 -05:00
anil
cfb7da98ed
Support Accept text/event-stream in chat and completion endpoints, resolves #1083 (#1088)
Co-authored-by: Anil Pathak <anil@heyday.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-01-16 12:52:52 -05:00
Andrei Betlen
4b11fa83c0 Bump version 2024-01-15 12:54:51 -05:00
Andrei Betlen
84615adbc6 Add split_mode option. Closes #1085 2024-01-15 12:49:20 -05:00
Phil H
76aafa6149
Implement GGUF metadata KV overrides (#1011)
* Implement GGUF metadata overrides

* whitespace fix

* Fix kv overrides.

* Fix pointer and pickle

* Match llama.cpp kv_overrides cli argument

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-15 12:29:29 -05:00
yieldthought
7eff42c239
Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor (#1012)
The existing code often causes "LookupError: unknown encoding: ascii" when open() called in a destructor. Saving open in self.open is not enough to avoid this. Instead, we can avoid reopening /dev/null every time by doing it once when the module is loaded.
2024-01-15 10:52:10 -05:00
Mark Neumann
c689ccc728
Fix Pydantic model parsing (#1087) 2024-01-15 10:45:57 -05:00
Andrei Betlen
5502ac8876 Update llama.cpp 2024-01-15 10:12:10 -05:00
Andrei Betlen
359ae73643 Update llama.cpp 2024-01-14 08:17:22 -05:00
Andrei Betlen
7c898d5684 Update llama.cpp 2024-01-13 22:37:49 -05:00
Andrei Betlen
bb610b9428 Update llama.cpp 2024-01-11 22:51:12 -05:00
Andrei Betlen
f0159663d9 Bump version 2024-01-10 02:51:17 -05:00
Stephen Hankinson
df3be58d6c
Add ability to pass in penalize_nl param (#1068) 2024-01-10 02:46:27 -05:00
Joseph Turian
2ddce7294e
print_grammar to stderr (#1052) 2024-01-10 02:46:03 -05:00
Andrei Betlen
1ae05c102b Update llama.cpp 2024-01-08 14:51:29 -05:00
Andrei Betlen
75d0527fd7 Bump version 2024-01-04 18:30:12 -05:00
Fedor Moiseev
907b9e9d42
Add Saiga chat format. (#1050) 2024-01-04 18:12:58 -05:00
xaviviro
cf743ec5d3
Added ChatGLM chat format (#1059)
Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>
2024-01-04 18:12:02 -05:00
Andrei Betlen
eb9c7d4ed8 Update llama.cpp 2024-01-03 22:04:04 -05:00
Andrei Betlen
011c3630f5 Bump version 2023-12-27 17:35:02 -05:00
Andrei Betlen
92284f32cb Add HIP_PATH to dll search directories for windows users. 2023-12-22 15:29:56 -05:00
Andrei Betlen
2b0d3f36fa set llama_max_devices using library function 2023-12-22 15:19:28 -05:00
Andrei Betlen
d9a1d90fd7 Fix typo 2023-12-22 15:12:27 -05:00
Andrei Betlen
37556bf9c4 Bump version 2023-12-22 14:55:58 -05:00
Andrei Betlen
6d8bc090f9 fix: inccorect bindings for kv override. Based on #1011 2023-12-22 14:52:20 -05:00
Andrei Betlen
522aecb868 docs: add server config docs 2023-12-22 14:37:24 -05:00
Andrei Betlen
6473796343 Update llama.cpp 2023-12-22 14:10:34 -05:00
swg
4b01a873ef
server: Support none defaulting to infinity for completions (#111)
* Support defaulting to infinity or -1 for chat completions

* Check if completion_tokens is none in error handler.

* fix: max_tokens in create completion should match openai spec

* Fix __call__

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 14:05:13 -05:00
Dave
12b7f2f4e9
[Feat] Multi model support (#931)
* Update Llama class to handle chat_format & caching

* Add settings.py

* Add util.py & update __main__.py

* multimodel

* update settings.py

* cleanup

* delete util.py

* Fix /v1/models endpoint

* MultiLlama now iterable, app check-alive on "/"

* instant model init if file is given

* backward compability

* revert model param mandatory

* fix error

* handle individual model config json

* refactor

* revert chathandler/clip_model changes

* handle chat_handler in MulitLlama()

* split settings into server/llama

* reduce global vars

* Update LlamaProxy to handle config files

* Add free method to LlamaProxy

* update arg parsers & install server alias

* refactor cache settings

* change server executable name

* better var name

* whitespace

* Revert "whitespace"

This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8.

* remove exe_name

* Fix merge bugs

* Fix type annotations

* Fix type annotations

* Fix uvicorn app factory

* Fix settings

* Refactor server

* Remove formatting fix

* Format

* Use default model if not found in model settings

* Fix

* Cleanup

* Fix

* Fix

* Remove unnused CommandLineSettings

* Cleanup

* Support default name for copilot-codex models

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 05:51:25 -05:00
Andrei Betlen
4a85442c35 Update llama.cpp 2023-12-22 00:12:37 -05:00
twaka
2f03fb0231
fix text_offset of multi-token characters (#1037)
* fix text_offsets for bytes tokens

* fix
2023-12-22 00:03:29 -05:00
docmeth02
33cc623346
Implement openai api compatible authentication (#1010) 2023-12-21 13:44:49 -05:00
Andrei Betlen
a05b4da80a fix: float32 is not JSON serializable when streaming logits. 2023-12-18 18:40:36 -05:00
Andrei Betlen
7df6c32544 Fix type annotations 2023-12-18 18:14:53 -05:00
Andrei Betlen
b703aad79e Fix type annotation 2023-12-18 18:13:37 -05:00
Andrei Betlen
d0aedfcff6 Fix type annotation 2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu
2993936b10
Fix ctypes definitions of llama_kv_cache_view_update and llama_kv_cache_view_free. (#1028) 2023-12-18 18:11:26 -05:00
Andrei Betlen
5e863d8a3b Bump version 2023-12-18 16:09:18 -05:00
Andrei Betlen
095c650006 Add offload_kqv option to llama and server 2023-12-18 15:36:09 -05:00
Andrei Betlen
472b344ae3 Remove unnused import 2023-12-18 15:32:40 -05:00
kddubey
6b2e0e05b4
perf: Don't convert logprobs arrays to lists (#1021) 2023-12-18 14:28:12 -05:00
Brandon Roberts
62944df142
Bugfix: Remove f16_kv, add offload_kqv field (#1019)
F16_KV appears to have been removed here: af99c6fbfc

This addresses two issues:

 - #995 which just requests to add the KV cache offloading param
 - #1006 a NULL ptr exception when using the embeddings (introduced by
   leaving f16_kv in the fields struct)
2023-12-18 14:27:11 -05:00
Daniele Morotti
f1c631dc53
Bug fixed with n_ctx=0 (#1015)
If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.
2023-12-16 18:59:50 -05:00
kddubey
5a8944672f
Fix logits_to_logprobs for 2-D and 3-D logits (#1002)
* Fix logits_to_logprobs for 2-D and 3-D logits

* Set dtype to single

* Test size
2023-12-16 18:59:26 -05:00
Andrei Betlen
534b1ea9b5 Update llama.cpp 2023-12-16 18:57:43 -05:00
Andrei Betlen
cbce061ffd Bump version 2023-12-13 21:52:29 -05:00
yhfgyyf
8b4db732bd
Add qwen chat format (#1005) 2023-12-13 21:43:43 -05:00
Andrei Betlen
690c563b60 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-12-13 21:43:19 -05:00
Andrei Betlen
c0fc0a1e82 Update llama.cpp 2023-12-13 21:43:16 -05:00
Radoslav Gerganov
8e44a32075
Add support for running the server with SSL (#994) 2023-12-11 20:47:11 -05:00
Tanner Hobson
ef22e478db
Replace logits_to_logprobs implementation with numpy equivalent to llama.cpp (#991)
See #990. This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.
2023-12-11 20:46:27 -05:00
zocainViken
ac35f68e4d
Fix UnsupportedOperation: fileno in suppress_stdout_stderr (#961)
* bug fixing

* llava from readme got this error: UnsupportedOperation: fileno   quick fix by checking hasattr

* multi modal params fix: add logits = True -> to make llava work

* multi modal params fix: add logits = True -> to make llava work

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2023-12-11 20:44:51 -05:00
chiensen
b938cccf05
Add Pygmalion chat format (#986) 2023-12-11 20:44:04 -05:00
Andrei Betlen
c1e73e73a3 Bump version 2023-12-11 10:26:42 -05:00
Andrei Betlen
ec26f364cc Remove f16_kv 2023-12-11 10:25:37 -05:00
Andrei Betlen
f1edc66b21 Update llama.cpp 2023-12-11 10:21:35 -05:00
kddubey
b069d06346
Fix #891 (#952) 2023-11-29 05:39:52 -05:00
Andrei Betlen
ad963a0961 Bump version 2023-11-28 04:58:20 -05:00
Andrei Betlen
e3941d9c67 Make building llava optional 2023-11-28 04:55:21 -05:00
Andrei Betlen
7f3704b896 Bump version 2023-11-27 19:14:25 -05:00
Andrei Betlen
396dbf0b2b docs: Improve low-level docstrings 2023-11-27 19:03:02 -05:00
Andrei Betlen
a928893d03 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-11-26 15:57:13 -05:00
Andrei Betlen
6308f21d5e docs: Update Llama docs 2023-11-26 15:56:40 -05:00
Gardner Bickford
c2d63a7148
fix: Typo in the Open Orca chat format #874 (#947) 2023-11-26 15:39:18 -05:00
Andrei Betlen
f03a38e62a Update llama.cpp 2023-11-26 15:38:22 -05:00
Andrei Betlen
1a7bf2037b docs: Update openapi endpoint names 2023-11-24 03:39:29 -05:00
Andrei Betlen
4026166e68 docs: Update completion and chat_completion parameter docstrings 2023-11-24 03:24:19 -05:00
Andrei Betlen
8c3aa7858b Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-11-24 00:15:36 -05:00
Andrei Betlen
de2e2bc083 misc fix verbose printing in functionary model 2023-11-23 20:14:23 -05:00
Andrei Betlen
36048d46af Update llama.cpp 2023-11-23 16:26:00 -05:00
mrfakename
d68fc07b1b
Add Zephyr format (#937) 2023-11-23 01:20:08 -05:00
caiyesd
4184835078
Add chat format to support baichuan (#938)
Signed-off-by: caiyesd <caiyesd@gmail.com>
2023-11-23 01:19:50 -05:00
Andrei Betlen
c647f01609 Add from_json_schema to LlamaGrammar 2023-11-23 00:27:00 -05:00
Andrei Betlen
be1f64d569 docs: Add docstrings from llama.cpp 2023-11-23 00:26:26 -05:00
Andrei Betlen
b6bb7ac76a docs: Add Llama class example 2023-11-22 23:10:04 -05:00
caiyesd
b8f29f4bf0
Add baichuan-2 chat format (#936)
Signed-off-by: caiyesd <caiyesd@gmail.com>
2023-11-22 06:08:06 -05:00
Andrei Betlen
8b6ca22846 Fix type warnings for json schema grammar converter 2023-11-21 13:32:00 -05:00
Andrei Betlen
230fc8b535 Bump version 2023-11-21 05:04:55 -05:00
Andrei Betlen
128dc4731f Fix #569 2023-11-21 04:39:05 -05:00
Andrei Betlen
7a3f87846b Format 2023-11-21 04:02:20 -05:00
Andrei Betlen
422ebc89ce Fix: Add logit_bias to all completion api methods 2023-11-21 04:01:36 -05:00
Andrei Betlen
07e47f55ba Add support for logit_bias outside of server api. Closes #827 2023-11-21 03:59:46 -05:00
Maarten ter Huurne
c21edb6908
Do not set grammar to None for new LlamaGrammar objects (#834)
* Do not set `grammar` to `None` for new `LlamaGrammar` objects

The `grammar` attribute is written by `init()`, but that method always
returns `None`, so `__init__()` would then discard the previously
written object.

* Add minimal test for grammar parsing
2023-11-21 00:23:18 -05:00
mrfakename
ef65fc5ff4
Add MistralLite, Intel, and OpenChat prompt formats (#927)
* Add MistralLite format

* Update llama_chat_format.py

* Update llama_chat_format.py
2023-11-21 00:19:25 -05:00
TK-Master
b8438f70b5
Added support for min_p (#921)
* Added support for min_p

My small contribution to this great project.

Ref: https://github.com/ggerganov/llama.cpp/pull/3841

Closes: https://github.com/abetlen/llama-cpp-python/issues/911

* Fix for negative temp (sample_softmax)
2023-11-20 23:21:33 -05:00
Andrei Betlen
a34d480141 Fix #929 2023-11-20 22:50:59 -05:00
Andrei Betlen
2c2afa320f Update llama.cpp 2023-11-20 14:11:33 -05:00
Andrei Betlen
f2901d840e Bump version 2023-11-14 14:10:00 -05:00
Andrei Betlen
01846a76b9 Bump version 2023-11-10 16:36:12 -05:00
Andrei Betlen
b7e60b66f4 Bump version 2023-11-10 06:21:24 -05:00
Andrei Betlen
6f0b0b1b84 Fix sampling bug when logits_all=False 2023-11-10 05:15:41 -05:00