Commit graph

770 commits

Author SHA1 Message Date
Andrei Betlen
4f01c452b6 fix: Change default verbose value of verbose in image chat format handlers to True to match Llama 2024-04-30 15:50:30 -04:00
Andrei Betlen
9286b5caac Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2024-04-30 15:45:36 -04:00
Andrei Betlen
f116175a5a fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729 2024-04-30 15:45:34 -04:00
Jonathan Soma
3226b3c5ef
fix: UTF-8 handling with grammars (#1415)
Use Python's built-in UTF-8 handling to get code points
2024-04-30 14:33:23 -04:00
Andrei Betlen
b14dd98922 chore: Bump version 2024-04-30 09:39:56 -04:00
Andrei Betlen
29b6e9a5c8 fix: wrong parameter for flash attention in pickle __getstate__ 2024-04-30 09:32:47 -04:00
Andrei Betlen
22d77eefd2 feat: Add option to enable flash_attn to Lllama params and ModelSettings 2024-04-30 09:29:16 -04:00
Andrei Betlen
8c2b24d5aa feat: Update llama.cpp 2024-04-30 09:27:55 -04:00
Andrei Betlen
f417cce28a chore: Bump version 2024-04-30 03:11:02 -04:00
Andrei Betlen
3489ef09d3 fix: Ensure image renders before text in chat formats regardless of message content order. 2024-04-30 03:08:46 -04:00
Andrei Betlen
26c7876ba0 chore: Bump version 2024-04-30 01:48:40 -04:00
Andrei
fe2da09538
feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147)
* Test dummy image tags in chat templates

* Format and improve  types for llava_cpp.py

* Add from_pretrained support to llava chat format.

* Refactor llava chat format to use a jinja2

* Revert chat format test

* Add moondream support (wip)

* Update moondream chat format

* Update moondream chat format

* Update moondream prompt

* Add function calling support

* Cache last image embed

* Add Llava1.6 support

* Add nanollava support

* Add obisidian support

* Remove unnecessary import

* Re-order multimodal chat formats

* Logits all no longer required for multi-modal models

* Update README.md

* Update docs

* Update README

* Fix typo

* Update README

* Fix typo
2024-04-30 01:35:38 -04:00
Andrei Betlen
97fb860eba feat: Update llama.cpp 2024-04-29 23:34:55 -04:00
Andrei Betlen
a411612b38 feat: Add support for str type kv_overrides 2024-04-27 23:42:19 -04:00
Andrei Betlen
c9b85bf098 feat: Update llama.cpp 2024-04-27 23:41:54 -04:00
Jeffrey Fong
f178636e1b
fix: Functionary bug fixes (#1385)
* fix completion tokens tracking, prompt forming

* fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8

* Updated README

* fix for openai server compatibility

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-27 20:49:52 -04:00
Andrei Betlen
65edc90671 chore: Bump version 2024-04-26 10:11:31 -04:00
Andrei Betlen
173ebc7878 fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings 2024-04-25 21:36:09 -04:00
Douglas Hanley
f6ed21f9a2
feat: Allow for possibly non-pooled embeddings (#1380)
* allow for possibly non-pooled embeddings

* add more to embeddings section in README.md

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-25 21:32:44 -04:00
Andrei Betlen
fcfea66857 fix: pydantic deprecation warning 2024-04-25 21:21:48 -04:00
Andrei Betlen
7f52335c50 feat: Update llama.cpp 2024-04-25 21:21:29 -04:00
Andrei Betlen
2a9979fce1 feat: Update llama.cpp 2024-04-25 02:48:26 -04:00
Andrei Betlen
c50d3300d2 chore: Bump version 2024-04-23 02:53:20 -04:00
Sean Bailey
53ebcc8bb5
feat(server): Provide ability to dynamically allocate all threads if desired using -1 (#1364) 2024-04-23 02:35:38 -04:00
abk16
8559e8ce88
feat: Add Llama-3 chat format (#1371)
* feat: Add Llama-3 chat format

* feat: Auto-detect Llama-3 chat format from gguf template

* feat: Update llama.cpp to b2715

Includes proper Llama-3 <|eot_id|> token handling.

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-23 02:33:29 -04:00
Andrei Betlen
d40a250ef3 feat: Use new llama_token_is_eog in create_completions 2024-04-22 00:35:47 -04:00
Andrei Betlen
b21ba0e2ac Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-04-21 20:46:42 -04:00
Andrei Betlen
159cc4e5d9 feat: Update llama.cpp 2024-04-21 20:46:40 -04:00
Andrei Betlen
0281214863 chore: Bump version 2024-04-20 00:09:37 -04:00
Andrei Betlen
cc81afebf0 feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct 2024-04-20 00:00:53 -04:00
Andrei Betlen
893a27a736 chore: Bump version 2024-04-18 01:43:39 -04:00
Lucca Zenóbio
4f42664955
feat: update grammar schema converter to match llama.cpp (#1353)
* feat: improve function calling

* feat:grammar

* fix

* fix

* fix
2024-04-18 01:36:25 -04:00
Andrei Betlen
fa4bb0cf81 Revert "feat: Update json to grammar (#1350)"
This reverts commit 610a592f70.
2024-04-17 16:18:16 -04:00
Lucca Zenóbio
610a592f70
feat: Update json to grammar (#1350)
* feat: improve function calling

* feat:grammar
2024-04-17 10:10:21 -04:00
khimaros
b73c73c0c6
feat: add disable_ping_events flag (#1257)
for backward compatibility, this is false by default

it can be set to true to disable EventSource pings
which are not supported by some OpenAI clients.

fixes https://github.com/abetlen/llama-cpp-python/issues/1256
2024-04-17 10:08:19 -04:00
tc-wolf
4924455dec
feat: Make saved state more compact on-disk (#1296)
* State load/save changes

- Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)`
  sized array.
  - Difference between ~350MB and ~1500MB for example prompt with ~300
    tokens (makes sense lol)
- Auto-formatting changes

* Back out formatting changes
2024-04-17 10:06:50 -04:00
ddh0
c96b2daebf feat: Use all available CPUs for batch processing (#1345) 2024-04-17 10:05:54 -04:00
Andrei Betlen
ef29235d45 chore: Bump version 2024-04-10 03:44:46 -04:00
Andrei Betlen
bb65b4d764 fix: pass correct type to chat handlers for chat completion logprobs 2024-04-10 03:41:55 -04:00
Andrei Betlen
060bfa64d5 feat: Add support for yaml based configs 2024-04-10 02:47:01 -04:00
Andrei Betlen
1347e1d050 feat: Add typechecking for ctypes structure attributes 2024-04-10 02:40:41 -04:00
Andrei Betlen
889d0e8981 feat: Update llama.cpp 2024-04-10 02:25:58 -04:00
Andrei Betlen
56071c956a feat: Update llama.cpp 2024-04-09 09:53:49 -04:00
Andrei Betlen
08b16afe11 chore: Bump version 2024-04-06 01:53:38 -04:00
Andrei Betlen
1ae3abbcc3 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314 2024-04-05 10:51:44 -04:00
Andrei Betlen
34081ddc5b chore: Bump version 2024-04-03 15:38:27 -04:00
Andrei Betlen
8649d7671b fix: segfault when logits_all=False. Closes #1319 2024-04-03 15:30:31 -04:00
Yuri Mikhailov
62aad610e1
fix: last tokens passing to sample_repetition_penalties function (#1295)
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-01 15:25:43 -04:00
Andrei Betlen
45bf5ae582 chore: Bump version 2024-04-01 10:28:22 -04:00
Limour
f165048a69
feat: add support for KV cache quantization options (#1307)
* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-01 10:19:28 -04:00