Daniel Thuerck
2138561fab
fix(server): Propagate flash_attn
to model load. ( #1424 )
2024-05-03 12:17:07 -04:00
Andrei Betlen
2117122396
chore: Bump version
2024-05-02 12:07:09 -04:00
Andrei Betlen
31b1d95a6c
feat: Add llama-3-vision-alpha chat format
2024-05-02 11:32:18 -04:00
Andrei Betlen
4f01c452b6
fix: Change default verbose value of verbose in image chat format handlers to True to match Llama
2024-04-30 15:50:30 -04:00
Andrei Betlen
9286b5caac
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2024-04-30 15:45:36 -04:00
Andrei Betlen
f116175a5a
fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729
2024-04-30 15:45:34 -04:00
Jonathan Soma
3226b3c5ef
fix: UTF-8 handling with grammars ( #1415 )
...
Use Python's built-in UTF-8 handling to get code points
2024-04-30 14:33:23 -04:00
Andrei Betlen
b14dd98922
chore: Bump version
2024-04-30 09:39:56 -04:00
Andrei Betlen
29b6e9a5c8
fix: wrong parameter for flash attention in pickle __getstate__
2024-04-30 09:32:47 -04:00
Andrei Betlen
22d77eefd2
feat: Add option to enable flash_attn
to Lllama params and ModelSettings
2024-04-30 09:29:16 -04:00
Andrei Betlen
8c2b24d5aa
feat: Update llama.cpp
2024-04-30 09:27:55 -04:00
Andrei Betlen
f417cce28a
chore: Bump version
2024-04-30 03:11:02 -04:00
Andrei Betlen
3489ef09d3
fix: Ensure image renders before text in chat formats regardless of message content order.
2024-04-30 03:08:46 -04:00
Andrei Betlen
26c7876ba0
chore: Bump version
2024-04-30 01:48:40 -04:00
Andrei
fe2da09538
feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) ( #1147 )
...
* Test dummy image tags in chat templates
* Format and improve types for llava_cpp.py
* Add from_pretrained support to llava chat format.
* Refactor llava chat format to use a jinja2
* Revert chat format test
* Add moondream support (wip)
* Update moondream chat format
* Update moondream chat format
* Update moondream prompt
* Add function calling support
* Cache last image embed
* Add Llava1.6 support
* Add nanollava support
* Add obisidian support
* Remove unnecessary import
* Re-order multimodal chat formats
* Logits all no longer required for multi-modal models
* Update README.md
* Update docs
* Update README
* Fix typo
* Update README
* Fix typo
2024-04-30 01:35:38 -04:00
Andrei Betlen
97fb860eba
feat: Update llama.cpp
2024-04-29 23:34:55 -04:00
Andrei Betlen
a411612b38
feat: Add support for str type kv_overrides
2024-04-27 23:42:19 -04:00
Andrei Betlen
c9b85bf098
feat: Update llama.cpp
2024-04-27 23:41:54 -04:00
Jeffrey Fong
f178636e1b
fix: Functionary bug fixes ( #1385 )
...
* fix completion tokens tracking, prompt forming
* fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8
* Updated README
* fix for openai server compatibility
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-27 20:49:52 -04:00
Andrei Betlen
65edc90671
chore: Bump version
2024-04-26 10:11:31 -04:00
Andrei Betlen
173ebc7878
fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings
2024-04-25 21:36:09 -04:00
Douglas Hanley
f6ed21f9a2
feat: Allow for possibly non-pooled embeddings ( #1380 )
...
* allow for possibly non-pooled embeddings
* add more to embeddings section in README.md
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-25 21:32:44 -04:00
Andrei Betlen
fcfea66857
fix: pydantic deprecation warning
2024-04-25 21:21:48 -04:00
Andrei Betlen
7f52335c50
feat: Update llama.cpp
2024-04-25 21:21:29 -04:00
Andrei Betlen
2a9979fce1
feat: Update llama.cpp
2024-04-25 02:48:26 -04:00
Andrei Betlen
c50d3300d2
chore: Bump version
2024-04-23 02:53:20 -04:00
Sean Bailey
53ebcc8bb5
feat(server): Provide ability to dynamically allocate all threads if desired using -1
( #1364 )
2024-04-23 02:35:38 -04:00
abk16
8559e8ce88
feat: Add Llama-3 chat format ( #1371 )
...
* feat: Add Llama-3 chat format
* feat: Auto-detect Llama-3 chat format from gguf template
* feat: Update llama.cpp to b2715
Includes proper Llama-3 <|eot_id|> token handling.
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-23 02:33:29 -04:00
Andrei Betlen
d40a250ef3
feat: Use new llama_token_is_eog in create_completions
2024-04-22 00:35:47 -04:00
Andrei Betlen
b21ba0e2ac
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-04-21 20:46:42 -04:00
Andrei Betlen
159cc4e5d9
feat: Update llama.cpp
2024-04-21 20:46:40 -04:00
Andrei Betlen
0281214863
chore: Bump version
2024-04-20 00:09:37 -04:00
Andrei Betlen
cc81afebf0
feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct
2024-04-20 00:00:53 -04:00
Andrei Betlen
893a27a736
chore: Bump version
2024-04-18 01:43:39 -04:00
Lucca Zenóbio
4f42664955
feat: update grammar schema converter to match llama.cpp ( #1353 )
...
* feat: improve function calling
* feat:grammar
* fix
* fix
* fix
2024-04-18 01:36:25 -04:00
Andrei Betlen
fa4bb0cf81
Revert "feat: Update json to grammar ( #1350 )"
...
This reverts commit 610a592f70
.
2024-04-17 16:18:16 -04:00
Lucca Zenóbio
610a592f70
feat: Update json to grammar ( #1350 )
...
* feat: improve function calling
* feat:grammar
2024-04-17 10:10:21 -04:00
khimaros
b73c73c0c6
feat: add disable_ping_events
flag ( #1257 )
...
for backward compatibility, this is false by default
it can be set to true to disable EventSource pings
which are not supported by some OpenAI clients.
fixes https://github.com/abetlen/llama-cpp-python/issues/1256
2024-04-17 10:08:19 -04:00
tc-wolf
4924455dec
feat: Make saved state more compact on-disk ( #1296 )
...
* State load/save changes
- Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)`
sized array.
- Difference between ~350MB and ~1500MB for example prompt with ~300
tokens (makes sense lol)
- Auto-formatting changes
* Back out formatting changes
2024-04-17 10:06:50 -04:00
ddh0
c96b2daebf
feat: Use all available CPUs for batch processing ( #1345 )
2024-04-17 10:05:54 -04:00
Andrei Betlen
ef29235d45
chore: Bump version
2024-04-10 03:44:46 -04:00
Andrei Betlen
bb65b4d764
fix: pass correct type to chat handlers for chat completion logprobs
2024-04-10 03:41:55 -04:00
Andrei Betlen
060bfa64d5
feat: Add support for yaml based configs
2024-04-10 02:47:01 -04:00
Andrei Betlen
1347e1d050
feat: Add typechecking for ctypes structure attributes
2024-04-10 02:40:41 -04:00
Andrei Betlen
889d0e8981
feat: Update llama.cpp
2024-04-10 02:25:58 -04:00
Andrei Betlen
56071c956a
feat: Update llama.cpp
2024-04-09 09:53:49 -04:00
Andrei Betlen
08b16afe11
chore: Bump version
2024-04-06 01:53:38 -04:00
Andrei Betlen
1ae3abbcc3
fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314
2024-04-05 10:51:44 -04:00
Andrei Betlen
34081ddc5b
chore: Bump version
2024-04-03 15:38:27 -04:00
Andrei Betlen
8649d7671b
fix: segfault when logits_all=False. Closes #1319
2024-04-03 15:30:31 -04:00
Yuri Mikhailov
62aad610e1
fix: last tokens passing to sample_repetition_penalties function ( #1295 )
...
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-01 15:25:43 -04:00
Andrei Betlen
45bf5ae582
chore: Bump version
2024-04-01 10:28:22 -04:00
Limour
f165048a69
feat: add support for KV cache quantization options ( #1307 )
...
* add KV cache quantization options
https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305
* Add ggml_type
* Use ggml_type instead of string for quantization
* Add server support
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-01 10:19:28 -04:00
windspirit95
aa9f1ae011
feat: Add logprobs support to chat completions ( #1311 )
...
* Add logprobs return in ChatCompletionResponse
* Fix duplicate field
* Set default to false
* Simplify check
* Add server example
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-31 13:30:13 -04:00
Andrei Betlen
125b2358c9
feat: Update llama.cpp
2024-03-28 12:06:46 -04:00
Andrei Betlen
901fe02461
feat: Update llama.cpp
2024-03-26 22:58:53 -04:00
Andrei Betlen
d11ccc3036
fix(server): minor type fixes
2024-03-23 17:14:15 -04:00
Andrei Betlen
c1325dcdfb
fix: tool_call missing first token.
2024-03-22 23:44:04 -04:00
Andrei Betlen
e325a831f0
feat: Update llama.cpp
2024-03-22 23:43:29 -04:00
Andrei Betlen
f7decc9562
docs: Add chat examples to openapi ui
2024-03-19 10:52:53 -04:00
Andrei
60d8498f21
feat: Add tools/functions variables to Jinja2ChatFormatter, add function response formatting for all simple chat formats ( #1273 )
...
* Add tools/functions variables to Jinja2ChatFormatter
Also fixed missing tools/tool_choices parameters in chat_formatter_to_chat_completion_handler().
* Set grammar when doing explicit function calling
* Add function / tool response for all chat formats
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2024-03-19 04:55:57 -04:00
Andrei Betlen
7d4a5ec59f
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-03-18 11:37:33 -04:00
Andrei Betlen
bf64752535
chore: Bump version
2024-03-18 11:37:30 -04:00
Jeffrey Fong
8a60c7bc8c
fix: Fix and optimize functionary chat handler ( #1282 )
...
* fix functionary chat logic
* further fixes
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-18 10:40:57 -04:00
Andrei Betlen
8d298b4750
feat: Update llama.cpp
2024-03-18 10:26:36 -04:00
Andrei Betlen
6eb25231e4
feat: Update llama.cpp
2024-03-15 12:58:45 -04:00
Andrei Betlen
20e6815252
fix: json mode
2024-03-15 12:58:34 -04:00
Andrei Betlen
4084aabe86
fix: set default pooling type to unspecified
2024-03-14 10:04:57 -04:00
Andrei Betlen
d318cc8b83
fix: Set default pooling_type to mean, check for null pointer.
2024-03-14 09:17:41 -04:00
Andrei Betlen
dd0ee56217
feat: Update llama.cpp
2024-03-13 15:57:35 -04:00
Andrei Betlen
08e910f7a7
feat: Update llama.cpp
2024-03-10 23:45:05 -04:00
Andrei Betlen
a7281994d8
chore: Bump version
2024-03-08 21:14:44 -05:00
Andrei Betlen
919fca9f2b
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-03-08 21:10:56 -05:00
Andrei Betlen
d02a9cf16f
Fixed json strings grammar by blacklisting character control set. Closes #1259
2024-03-08 21:10:53 -05:00
Felipe Lorenz
c139f8b5d5
feat: Add endpoints for tokenize, detokenize and count tokens ( #1136 )
...
* Add endpoint to count tokens
* Add tokenize and detokenize endpoints
* Change response key to tokens for tokenize endpoint
* Fix dependency bug
* Cleanup
* Remove example added by mistake
* Move tokenize, detokenize, and count to Extras namespace. Tag existing endpoints
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-08 21:09:00 -05:00
Kevin Cao
1f3156d4f2
fix: Check for existence of clip model path ( #1264 )
2024-03-08 21:00:10 -05:00
Douglas Hanley
2811014bae
feat: Switch embed to llama_get_embeddings_seq ( #1263 )
...
* switch to llama_get_embeddings_seq
* Remove duplicate definition of llama_get_embeddings_seq
Co-authored-by: Andrei <abetlen@gmail.com>
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-08 20:59:35 -05:00
Andrei Betlen
40c6b54f68
feat: Update llama.cpp
2024-03-08 20:58:50 -05:00
Andrei Betlen
93dc56ace8
Update llama.cpp
2024-03-06 01:32:00 -05:00
Andrei Betlen
87a6e5797e
feat: Update llama.cpp
2024-03-03 11:27:04 -05:00
Andrei Betlen
13177aae0f
chore: Bump version
2024-03-02 22:46:40 -05:00
Andrei Betlen
0e70984fb6
feat: Update llama.cpp
2024-03-02 22:20:04 -05:00
Andrei Betlen
d5df431278
chore: Bump version
2024-03-01 13:15:16 -05:00
Andrei Betlen
97aa3a153d
docs: Add information re: auto chat formats. Closes #1236
2024-03-01 13:10:25 -05:00
Andrei Betlen
f062a7f51d
feat: Update llama.cpp
2024-03-01 12:57:16 -05:00
Andrei Betlen
8c71725d53
fix: Remove deprecated cfg sampling functions
2024-02-28 14:37:07 -05:00
Andrei Betlen
727d60c28a
misc: Format
2024-02-28 14:27:40 -05:00
Andrei Betlen
0d37ce52b1
feat: Update llama.cpp
2024-02-28 14:27:16 -05:00
Andrei Betlen
ffcd4b2636
chore: Bump version
2024-02-28 01:38:32 -05:00
Sigbjørn Skjæret
c36ab15e68
fix: eos/bos_token set correctly for Jinja2ChatFormatter and automatic chat formatter ( #1230 )
...
The token strings were not correctly retrieved (empty).
2024-02-28 01:30:31 -05:00
Andrei Betlen
fea33c9b94
feat: Update llama.cpp
2024-02-27 12:22:17 -05:00
Andrei
4d574bd765
feat(server): Add support for pulling models from Huggingface Hub ( #1222 )
...
* Basic support for hf pull on server
* Add hf_model_repo_id setting
* Update README
2024-02-26 14:35:08 -05:00
Andrei Betlen
afe1e445c9
chore: Bump version
2024-02-26 11:43:24 -05:00
Andrei Betlen
9558ce7878
feat: Update llama.cpp
2024-02-26 11:40:58 -05:00
Andrei Betlen
dbaba3059d
fix: positional arguments only for low-level api
2024-02-26 11:31:11 -05:00
Andrei Betlen
78e536dcfe
fix: typo
2024-02-26 11:14:26 -05:00
Andrei Betlen
44558cbd7a
misc: llava_cpp use ctypes function decorator for binding
2024-02-26 11:07:33 -05:00
Andrei Betlen
8383a9e562
fix: llava this function takes at least 4 arguments (0 given)
2024-02-26 11:03:20 -05:00
Andrei Betlen
8e03fd9957
chore: Bump version
2024-02-25 21:15:42 -05:00
Andrei Betlen
dcf38f6141
fix: remove prematurely commited change
2024-02-25 21:00:37 -05:00