Commit graph

798 commits

Author SHA1 Message Date
Andrei Betlen
c50d3300d2 chore: Bump version 2024-04-23 02:53:20 -04:00
Sean Bailey
53ebcc8bb5
feat(server): Provide ability to dynamically allocate all threads if desired using -1 (#1364) 2024-04-23 02:35:38 -04:00
abk16
8559e8ce88
feat: Add Llama-3 chat format (#1371)
* feat: Add Llama-3 chat format

* feat: Auto-detect Llama-3 chat format from gguf template

* feat: Update llama.cpp to b2715

Includes proper Llama-3 <|eot_id|> token handling.

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-23 02:33:29 -04:00
Andrei Betlen
d40a250ef3 feat: Use new llama_token_is_eog in create_completions 2024-04-22 00:35:47 -04:00
Andrei Betlen
b21ba0e2ac Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-04-21 20:46:42 -04:00
Andrei Betlen
159cc4e5d9 feat: Update llama.cpp 2024-04-21 20:46:40 -04:00
Andrei Betlen
0281214863 chore: Bump version 2024-04-20 00:09:37 -04:00
Andrei Betlen
cc81afebf0 feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct 2024-04-20 00:00:53 -04:00
Andrei Betlen
893a27a736 chore: Bump version 2024-04-18 01:43:39 -04:00
Lucca Zenóbio
4f42664955
feat: update grammar schema converter to match llama.cpp (#1353)
* feat: improve function calling

* feat:grammar

* fix

* fix

* fix
2024-04-18 01:36:25 -04:00
Andrei Betlen
fa4bb0cf81 Revert "feat: Update json to grammar (#1350)"
This reverts commit 610a592f70.
2024-04-17 16:18:16 -04:00
Lucca Zenóbio
610a592f70
feat: Update json to grammar (#1350)
* feat: improve function calling

* feat:grammar
2024-04-17 10:10:21 -04:00
khimaros
b73c73c0c6
feat: add disable_ping_events flag (#1257)
for backward compatibility, this is false by default

it can be set to true to disable EventSource pings
which are not supported by some OpenAI clients.

fixes https://github.com/abetlen/llama-cpp-python/issues/1256
2024-04-17 10:08:19 -04:00
tc-wolf
4924455dec
feat: Make saved state more compact on-disk (#1296)
* State load/save changes

- Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)`
  sized array.
  - Difference between ~350MB and ~1500MB for example prompt with ~300
    tokens (makes sense lol)
- Auto-formatting changes

* Back out formatting changes
2024-04-17 10:06:50 -04:00
ddh0
c96b2daebf feat: Use all available CPUs for batch processing (#1345) 2024-04-17 10:05:54 -04:00
Andrei Betlen
ef29235d45 chore: Bump version 2024-04-10 03:44:46 -04:00
Andrei Betlen
bb65b4d764 fix: pass correct type to chat handlers for chat completion logprobs 2024-04-10 03:41:55 -04:00
Andrei Betlen
060bfa64d5 feat: Add support for yaml based configs 2024-04-10 02:47:01 -04:00
Andrei Betlen
1347e1d050 feat: Add typechecking for ctypes structure attributes 2024-04-10 02:40:41 -04:00
Andrei Betlen
889d0e8981 feat: Update llama.cpp 2024-04-10 02:25:58 -04:00
Andrei Betlen
56071c956a feat: Update llama.cpp 2024-04-09 09:53:49 -04:00
Andrei Betlen
08b16afe11 chore: Bump version 2024-04-06 01:53:38 -04:00
Andrei Betlen
1ae3abbcc3 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314 2024-04-05 10:51:44 -04:00
Andrei Betlen
34081ddc5b chore: Bump version 2024-04-03 15:38:27 -04:00
Andrei Betlen
8649d7671b fix: segfault when logits_all=False. Closes #1319 2024-04-03 15:30:31 -04:00
Yuri Mikhailov
62aad610e1
fix: last tokens passing to sample_repetition_penalties function (#1295)
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-01 15:25:43 -04:00
Andrei Betlen
45bf5ae582 chore: Bump version 2024-04-01 10:28:22 -04:00
Limour
f165048a69
feat: add support for KV cache quantization options (#1307)
* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-01 10:19:28 -04:00
windspirit95
aa9f1ae011
feat: Add logprobs support to chat completions (#1311)
* Add logprobs return in ChatCompletionResponse

* Fix duplicate field

* Set default to false

* Simplify check

* Add server example

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-31 13:30:13 -04:00
Andrei Betlen
125b2358c9 feat: Update llama.cpp 2024-03-28 12:06:46 -04:00
Andrei Betlen
901fe02461 feat: Update llama.cpp 2024-03-26 22:58:53 -04:00
Andrei Betlen
d11ccc3036 fix(server): minor type fixes 2024-03-23 17:14:15 -04:00
Andrei Betlen
c1325dcdfb fix: tool_call missing first token. 2024-03-22 23:44:04 -04:00
Andrei Betlen
e325a831f0 feat: Update llama.cpp 2024-03-22 23:43:29 -04:00
Andrei Betlen
f7decc9562 docs: Add chat examples to openapi ui 2024-03-19 10:52:53 -04:00
Andrei
60d8498f21
feat: Add tools/functions variables to Jinja2ChatFormatter, add function response formatting for all simple chat formats (#1273)
* Add tools/functions variables to Jinja2ChatFormatter

Also fixed missing tools/tool_choices parameters in chat_formatter_to_chat_completion_handler().

* Set grammar when doing explicit function calling

* Add function / tool response for all chat formats

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2024-03-19 04:55:57 -04:00
Andrei Betlen
7d4a5ec59f Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-03-18 11:37:33 -04:00
Andrei Betlen
bf64752535 chore: Bump version 2024-03-18 11:37:30 -04:00
Jeffrey Fong
8a60c7bc8c
fix: Fix and optimize functionary chat handler (#1282)
* fix functionary chat logic

* further fixes

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-18 10:40:57 -04:00
Andrei Betlen
8d298b4750 feat: Update llama.cpp 2024-03-18 10:26:36 -04:00
Andrei Betlen
6eb25231e4 feat: Update llama.cpp 2024-03-15 12:58:45 -04:00
Andrei Betlen
20e6815252 fix: json mode 2024-03-15 12:58:34 -04:00
Andrei Betlen
4084aabe86 fix: set default pooling type to unspecified 2024-03-14 10:04:57 -04:00
Andrei Betlen
d318cc8b83 fix: Set default pooling_type to mean, check for null pointer. 2024-03-14 09:17:41 -04:00
Andrei Betlen
dd0ee56217 feat: Update llama.cpp 2024-03-13 15:57:35 -04:00
Andrei Betlen
08e910f7a7 feat: Update llama.cpp 2024-03-10 23:45:05 -04:00
Andrei Betlen
a7281994d8 chore: Bump version 2024-03-08 21:14:44 -05:00
Andrei Betlen
919fca9f2b Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-03-08 21:10:56 -05:00
Andrei Betlen
d02a9cf16f Fixed json strings grammar by blacklisting character control set. Closes #1259 2024-03-08 21:10:53 -05:00
Felipe Lorenz
c139f8b5d5
feat: Add endpoints for tokenize, detokenize and count tokens (#1136)
* Add endpoint to count tokens

* Add tokenize and detokenize endpoints

* Change response key to tokens for tokenize endpoint

* Fix dependency bug

* Cleanup

* Remove example added by mistake

* Move tokenize, detokenize, and count to Extras namespace. Tag existing endpoints

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-08 21:09:00 -05:00
Kevin Cao
1f3156d4f2
fix: Check for existence of clip model path (#1264) 2024-03-08 21:00:10 -05:00
Douglas Hanley
2811014bae
feat: Switch embed to llama_get_embeddings_seq (#1263)
* switch to llama_get_embeddings_seq

* Remove duplicate definition of llama_get_embeddings_seq

Co-authored-by: Andrei <abetlen@gmail.com>

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-08 20:59:35 -05:00
Andrei Betlen
40c6b54f68 feat: Update llama.cpp 2024-03-08 20:58:50 -05:00
Andrei Betlen
93dc56ace8 Update llama.cpp 2024-03-06 01:32:00 -05:00
Andrei Betlen
87a6e5797e feat: Update llama.cpp 2024-03-03 11:27:04 -05:00
Andrei Betlen
13177aae0f chore: Bump version 2024-03-02 22:46:40 -05:00
Andrei Betlen
0e70984fb6 feat: Update llama.cpp 2024-03-02 22:20:04 -05:00
Andrei Betlen
d5df431278 chore: Bump version 2024-03-01 13:15:16 -05:00
Andrei Betlen
97aa3a153d docs: Add information re: auto chat formats. Closes #1236 2024-03-01 13:10:25 -05:00
Andrei Betlen
f062a7f51d feat: Update llama.cpp 2024-03-01 12:57:16 -05:00
Andrei Betlen
8c71725d53 fix: Remove deprecated cfg sampling functions 2024-02-28 14:37:07 -05:00
Andrei Betlen
727d60c28a misc: Format 2024-02-28 14:27:40 -05:00
Andrei Betlen
0d37ce52b1 feat: Update llama.cpp 2024-02-28 14:27:16 -05:00
Andrei Betlen
ffcd4b2636 chore: Bump version 2024-02-28 01:38:32 -05:00
Sigbjørn Skjæret
c36ab15e68
fix: eos/bos_token set correctly for Jinja2ChatFormatter and automatic chat formatter (#1230)
The token strings were not correctly retrieved (empty).
2024-02-28 01:30:31 -05:00
Andrei Betlen
fea33c9b94 feat: Update llama.cpp 2024-02-27 12:22:17 -05:00
Andrei
4d574bd765
feat(server): Add support for pulling models from Huggingface Hub (#1222)
* Basic support for hf pull on server

* Add hf_model_repo_id setting

* Update README
2024-02-26 14:35:08 -05:00
Andrei Betlen
afe1e445c9 chore: Bump version 2024-02-26 11:43:24 -05:00
Andrei Betlen
9558ce7878 feat: Update llama.cpp 2024-02-26 11:40:58 -05:00
Andrei Betlen
dbaba3059d fix: positional arguments only for low-level api 2024-02-26 11:31:11 -05:00
Andrei Betlen
78e536dcfe fix: typo 2024-02-26 11:14:26 -05:00
Andrei Betlen
44558cbd7a misc: llava_cpp use ctypes function decorator for binding 2024-02-26 11:07:33 -05:00
Andrei Betlen
8383a9e562 fix: llava this function takes at least 4 arguments (0 given) 2024-02-26 11:03:20 -05:00
Andrei Betlen
8e03fd9957 chore: Bump version 2024-02-25 21:15:42 -05:00
Andrei Betlen
dcf38f6141 fix: remove prematurely commited change 2024-02-25 21:00:37 -05:00
Andrei Betlen
cbbcd888af feat: Update llama.cpp 2024-02-25 20:52:14 -05:00
Andrei Betlen
19234aa0db fix: Restore type hints for low-level api 2024-02-25 16:54:37 -05:00
Andrei Betlen
2292af5796 feat: Update llama.cpp 2024-02-25 16:53:58 -05:00
Andrei Betlen
221edb9ef1 feat: Update llama.cpp 2024-02-24 23:47:29 -05:00
Andrei Betlen
20ea6fd7d6 chore: Bump version 2024-02-23 12:38:36 -05:00
Andrei Betlen
47bad30dd7 fix: LlamaHFTokenizer now receives pre_tokens 2024-02-23 12:23:24 -05:00
Andrei Betlen
ded5d627a5 chore: Bump version 2024-02-23 11:32:43 -05:00
Luke Stanley
858496224e
feat: Auto detect Mixtral's slightly different format (#1214) 2024-02-23 11:27:38 -05:00
Andrei Betlen
db776a885c fix: module 'llama_cpp.llama_cpp' has no attribute 'c_uint8' 2024-02-23 11:24:53 -05:00
Andrei Betlen
427d816ebf chore: Bump version 2024-02-23 04:54:08 -05:00
Alvaro Bartolome
251a8a2cad
feat: Add Google's Gemma formatting via chat_format="gemma" (#1210)
* Add Google's Gemma formatting via `chat_format="gemma"`

* Replace `raise ValueError` with `logger.debug`

Co-authored-by: Andrei <abetlen@gmail.com>

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-23 04:40:52 -05:00
Andrei Betlen
b9aca612af misc: use typesafe byref for internal classes 2024-02-23 03:40:07 -05:00
Andrei Betlen
a0ce429dc0 misc: use decorator to bind low level api functions, fixes docs 2024-02-23 03:39:38 -05:00
Andrei Betlen
e10af30cf1 fix: TypeAlias import error 2024-02-22 03:27:28 -05:00
Andrei Betlen
3561ebf536 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-02-22 03:25:13 -05:00
Andrei Betlen
aefcb8f71a misc: additional type annotations for low level api 2024-02-22 02:00:09 -05:00
Andrei Betlen
3921e10770 feat: support minItems/maxItems in JSON grammar converter (by @nopperl) 2024-02-22 00:17:06 -05:00
Andrei Betlen
e6d6260a91 fix: Update from_pretrained defaults to match hf_hub_download 2024-02-22 00:10:23 -05:00
Andrei Betlen
dd22010e85 fix: Raise exceptions when llama model or context fails to load 2024-02-22 00:09:45 -05:00
Andrei Betlen
3632241e98 chore: Bump version 2024-02-21 23:09:13 -05:00
Andrei Betlen
0653e15c20 feat: Update llama.cpp 2024-02-21 23:04:52 -05:00
Andrei Betlen
7981e9ce1e chore: Bump version 2024-02-21 16:30:59 -05:00
Andrei
7f51b6071f
feat(low-level-api): Improve API static type-safety and performance (#1205) 2024-02-21 16:25:38 -05:00
Andrei
0f8aa4ab5c
feat: Pull models directly from huggingface (#1206)
* Add from_pretrained method to Llama class

* Update docs

* Merge filename and pattern
2024-02-21 16:25:10 -05:00
Andrei Betlen
e42f62c247 chore: Bump version 2024-02-21 11:09:40 -05:00