Commit graph

1402 commits

Author SHA1 Message Date
238d2e0290
Merge https://github.com/abetlen/llama-cpp-python 2024-01-24 16:12:35 +05:30
Andrei Betlen
9677a1f2c8 fix: Check order 2024-01-23 22:28:03 -05:00
Andrei Betlen
4d6b2f7b91 fix: format 2024-01-23 22:08:27 -05:00
Phil H
fe5d6ea648
fix: GGUF metadata KV overrides, re #1011 (#1116)
* kv overrides another attempt

* add sentinel element, simplify array population

* ensure sentinel element is zeroed
2024-01-23 22:00:38 -05:00
Andrei Betlen
7e63928bc9 Update llama.cpp 2024-01-23 18:42:39 -05:00
Andrei Betlen
fcdf337d84 Update llama.cpp 2024-01-22 11:25:11 -05:00
Andrei Betlen
5b982d0f8c fix: use both eos and bos tokens as stop sequences for hf-tokenizer-config chat format. 2024-01-22 08:32:48 -05:00
8806f19ef9
Merge https://github.com/abetlen/llama-cpp-python 2024-01-22 18:40:46 +05:30
Andrei Betlen
2ce0b8aa2c Bump version 2024-01-21 20:30:24 -05:00
Andrei Betlen
d3f5528ca8 fix: from_json_schema oneof/anyof bug. Closes #1097 2024-01-21 19:06:53 -05:00
Andrei Betlen
8eefdbca03 Update llama.cpp 2024-01-21 19:01:27 -05:00
Andrei Betlen
88fbccaaa3 docs: Add macosx wrong arch fix to README 2024-01-21 18:38:44 -05:00
Andrei Betlen
24f39454e9 fix: pass chat handler not chat formatter for huggingface autotokenizer and tokenizer_config formats. 2024-01-21 18:38:04 -05:00
Andrei Betlen
7f3209b1eb feat: Add add_generation_prompt option for jinja2chatformatter. 2024-01-21 18:37:24 -05:00
Andrei Betlen
ac2e96d4b4 Update llama.cpp 2024-01-19 15:33:43 -05:00
Andrei Betlen
be09318c26 feat: Add Jinja2ChatFormatter 2024-01-19 15:04:42 -05:00
Andrei Betlen
5a34c57e54 feat: Expose gguf model metadata in metadata property 2024-01-19 10:46:03 -05:00
Andrei Betlen
833a7f1a86 Bump version 2024-01-19 09:03:35 -05:00
Andrei Betlen
e21c3c7a91 Update makefile 2024-01-19 08:47:56 -05:00
Andrei Betlen
0f54948482 Update llama.cpp 2024-01-19 08:41:52 -05:00
Andrei Betlen
3babe3512c Fix mirostat sampling 2024-01-19 08:31:59 -05:00
Andrei Betlen
141293a75b Fix python3.8 support 2024-01-19 08:17:49 -05:00
Andrei Betlen
656f3d8968 Bump version 2024-01-18 21:30:36 -05:00
Andrei Betlen
03ed547bfd Remove templates doc 2024-01-18 21:23:26 -05:00
Andrei Betlen
3ca86ab390 Update llama.cpp 2024-01-18 21:22:45 -05:00
Andrei Betlen
be23404ed4 Cleanup pyproject 2024-01-18 21:22:19 -05:00
Andrei Betlen
89cce50f8c Update llama.cpp 2024-01-18 21:21:49 -05:00
Andrei Betlen
b8fc1c7d83 feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files. 2024-01-18 21:21:37 -05:00
Andrei Betlen
48c3b77e6f Offload KQV by default 2024-01-18 11:08:57 -05:00
Austin
6bfe98bd80
Integration of Jinja2 Templating (#875)
* feat: Add support for jinja templating

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* fix: Refactor chat formatter and update interface for jinja templates

- Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality.
- Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility.
- Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string.
- Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation.

These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage.

* Add outline for Jinja2 templating integration documentation

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Add jinja2 as a dependency with version range for Hugging Face transformers compatibility

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Update jinja2 version constraint for mkdocs-material compatibility

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Fix attribute name in AutoChatFormatter

- Changed attribute name from `self._renderer` to `self._environment`

---------

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
2024-01-17 09:47:52 -05:00
Andrei Betlen
52adc23115 Update llama.cpp 2024-01-17 09:27:40 -05:00
Andrei Betlen
7b46bb5a78 Re-order classes in llama.py 2024-01-17 09:16:13 -05:00
Andrei Betlen
cc4630e66f Move helper classes to _internals submodule 2024-01-17 09:14:00 -05:00
Andrei Betlen
3b92419132 Move cache classes to llama_cache submodule. 2024-01-17 09:09:12 -05:00
833126bbd3
Merge https://github.com/abetlen/llama-cpp-python 2024-01-17 12:10:00 +05:30
Andrei Betlen
6981597835 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-01-16 19:35:59 -05:00
Andrei Betlen
d5dbb3f8de Update llama.cpp 2024-01-16 19:35:57 -05:00
Jerry Liu
84380fe9a6
Add llamaindex integration to readme (#1092) 2024-01-16 19:10:50 -05:00
Kyle Mistele
9c36688b33
fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 (#1093) 2024-01-16 18:54:06 -05:00
anil
cfb7da98ed
Support Accept text/event-stream in chat and completion endpoints, resolves #1083 (#1088)
Co-authored-by: Anil Pathak <anil@heyday.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-01-16 12:52:52 -05:00
Andrei Betlen
e39778f8eb Update llama.cpp 2024-01-16 11:56:44 -05:00
Andrei Betlen
4b11fa83c0 Bump version 2024-01-15 12:54:51 -05:00
Andrei Betlen
84615adbc6 Add split_mode option. Closes #1085 2024-01-15 12:49:20 -05:00
Phil H
76aafa6149
Implement GGUF metadata KV overrides (#1011)
* Implement GGUF metadata overrides

* whitespace fix

* Fix kv overrides.

* Fix pointer and pickle

* Match llama.cpp kv_overrides cli argument

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-15 12:29:29 -05:00
yieldthought
7eff42c239
Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor (#1012)
The existing code often causes "LookupError: unknown encoding: ascii" when open() called in a destructor. Saving open in self.open is not enough to avoid this. Instead, we can avoid reopening /dev/null every time by doing it once when the module is loaded.
2024-01-15 10:52:10 -05:00
anil
1eaace8ea3
Fix low_level_api_chat_cpp example to match current API (#1086)
* Fix low_level_api_chat_cpp to match current API

* Fix low_level_api_chat_cpp to match current API

* Using None instead of empty string to so that default prompt template can be used if no prompt provided

---------

Co-authored-by: Anil Pathak <anil@heyday.com>
2024-01-15 10:46:35 -05:00
Mark Neumann
c689ccc728
Fix Pydantic model parsing (#1087) 2024-01-15 10:45:57 -05:00
Andrei Betlen
5502ac8876 Update llama.cpp 2024-01-15 10:12:10 -05:00
Andrei Betlen
359ae73643 Update llama.cpp 2024-01-14 08:17:22 -05:00
966f8cb64f
Merge https://github.com/abetlen/llama-cpp-python 2024-01-14 14:56:35 +05:30