Andrei Betlen
1ae3abbcc3
fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314
2024-04-05 10:51:44 -04:00
Andrei Betlen
34081ddc5b
chore: Bump version
2024-04-03 15:38:27 -04:00
Andrei Betlen
8649d7671b
fix: segfault when logits_all=False. Closes #1319
2024-04-03 15:30:31 -04:00
Yuri Mikhailov
62aad610e1
fix: last tokens passing to sample_repetition_penalties function ( #1295 )
...
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-01 15:25:43 -04:00
Andrei Betlen
45bf5ae582
chore: Bump version
2024-04-01 10:28:22 -04:00
Limour
f165048a69
feat: add support for KV cache quantization options ( #1307 )
...
* add KV cache quantization options
https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305
* Add ggml_type
* Use ggml_type instead of string for quantization
* Add server support
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-01 10:19:28 -04:00
windspirit95
aa9f1ae011
feat: Add logprobs support to chat completions ( #1311 )
...
* Add logprobs return in ChatCompletionResponse
* Fix duplicate field
* Set default to false
* Simplify check
* Add server example
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-31 13:30:13 -04:00
Andrei Betlen
125b2358c9
feat: Update llama.cpp
2024-03-28 12:06:46 -04:00
Andrei Betlen
901fe02461
feat: Update llama.cpp
2024-03-26 22:58:53 -04:00
Andrei Betlen
d11ccc3036
fix(server): minor type fixes
2024-03-23 17:14:15 -04:00
Andrei Betlen
c1325dcdfb
fix: tool_call missing first token.
2024-03-22 23:44:04 -04:00
Andrei Betlen
e325a831f0
feat: Update llama.cpp
2024-03-22 23:43:29 -04:00
Andrei Betlen
f7decc9562
docs: Add chat examples to openapi ui
2024-03-19 10:52:53 -04:00
Andrei
60d8498f21
feat: Add tools/functions variables to Jinja2ChatFormatter, add function response formatting for all simple chat formats ( #1273 )
...
* Add tools/functions variables to Jinja2ChatFormatter
Also fixed missing tools/tool_choices parameters in chat_formatter_to_chat_completion_handler().
* Set grammar when doing explicit function calling
* Add function / tool response for all chat formats
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2024-03-19 04:55:57 -04:00
Andrei Betlen
7d4a5ec59f
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-03-18 11:37:33 -04:00
Andrei Betlen
bf64752535
chore: Bump version
2024-03-18 11:37:30 -04:00
Jeffrey Fong
8a60c7bc8c
fix: Fix and optimize functionary chat handler ( #1282 )
...
* fix functionary chat logic
* further fixes
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-18 10:40:57 -04:00
Andrei Betlen
8d298b4750
feat: Update llama.cpp
2024-03-18 10:26:36 -04:00
Andrei Betlen
6eb25231e4
feat: Update llama.cpp
2024-03-15 12:58:45 -04:00
Andrei Betlen
20e6815252
fix: json mode
2024-03-15 12:58:34 -04:00
Andrei Betlen
4084aabe86
fix: set default pooling type to unspecified
2024-03-14 10:04:57 -04:00
Andrei Betlen
d318cc8b83
fix: Set default pooling_type to mean, check for null pointer.
2024-03-14 09:17:41 -04:00
Andrei Betlen
dd0ee56217
feat: Update llama.cpp
2024-03-13 15:57:35 -04:00
Andrei Betlen
08e910f7a7
feat: Update llama.cpp
2024-03-10 23:45:05 -04:00
Andrei Betlen
a7281994d8
chore: Bump version
2024-03-08 21:14:44 -05:00
Andrei Betlen
919fca9f2b
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-03-08 21:10:56 -05:00
Andrei Betlen
d02a9cf16f
Fixed json strings grammar by blacklisting character control set. Closes #1259
2024-03-08 21:10:53 -05:00
Felipe Lorenz
c139f8b5d5
feat: Add endpoints for tokenize, detokenize and count tokens ( #1136 )
...
* Add endpoint to count tokens
* Add tokenize and detokenize endpoints
* Change response key to tokens for tokenize endpoint
* Fix dependency bug
* Cleanup
* Remove example added by mistake
* Move tokenize, detokenize, and count to Extras namespace. Tag existing endpoints
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-08 21:09:00 -05:00
Kevin Cao
1f3156d4f2
fix: Check for existence of clip model path ( #1264 )
2024-03-08 21:00:10 -05:00
Douglas Hanley
2811014bae
feat: Switch embed to llama_get_embeddings_seq ( #1263 )
...
* switch to llama_get_embeddings_seq
* Remove duplicate definition of llama_get_embeddings_seq
Co-authored-by: Andrei <abetlen@gmail.com>
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-08 20:59:35 -05:00
Andrei Betlen
40c6b54f68
feat: Update llama.cpp
2024-03-08 20:58:50 -05:00
Andrei Betlen
93dc56ace8
Update llama.cpp
2024-03-06 01:32:00 -05:00
Andrei Betlen
87a6e5797e
feat: Update llama.cpp
2024-03-03 11:27:04 -05:00
Andrei Betlen
13177aae0f
chore: Bump version
2024-03-02 22:46:40 -05:00
Andrei Betlen
0e70984fb6
feat: Update llama.cpp
2024-03-02 22:20:04 -05:00
Andrei Betlen
d5df431278
chore: Bump version
2024-03-01 13:15:16 -05:00
Andrei Betlen
97aa3a153d
docs: Add information re: auto chat formats. Closes #1236
2024-03-01 13:10:25 -05:00
Andrei Betlen
f062a7f51d
feat: Update llama.cpp
2024-03-01 12:57:16 -05:00
Andrei Betlen
8c71725d53
fix: Remove deprecated cfg sampling functions
2024-02-28 14:37:07 -05:00
Andrei Betlen
727d60c28a
misc: Format
2024-02-28 14:27:40 -05:00
Andrei Betlen
0d37ce52b1
feat: Update llama.cpp
2024-02-28 14:27:16 -05:00
Andrei Betlen
ffcd4b2636
chore: Bump version
2024-02-28 01:38:32 -05:00
Sigbjørn Skjæret
c36ab15e68
fix: eos/bos_token set correctly for Jinja2ChatFormatter and automatic chat formatter ( #1230 )
...
The token strings were not correctly retrieved (empty).
2024-02-28 01:30:31 -05:00
Andrei Betlen
fea33c9b94
feat: Update llama.cpp
2024-02-27 12:22:17 -05:00
Andrei
4d574bd765
feat(server): Add support for pulling models from Huggingface Hub ( #1222 )
...
* Basic support for hf pull on server
* Add hf_model_repo_id setting
* Update README
2024-02-26 14:35:08 -05:00
Andrei Betlen
afe1e445c9
chore: Bump version
2024-02-26 11:43:24 -05:00
Andrei Betlen
9558ce7878
feat: Update llama.cpp
2024-02-26 11:40:58 -05:00
Andrei Betlen
dbaba3059d
fix: positional arguments only for low-level api
2024-02-26 11:31:11 -05:00
Andrei Betlen
78e536dcfe
fix: typo
2024-02-26 11:14:26 -05:00
Andrei Betlen
44558cbd7a
misc: llava_cpp use ctypes function decorator for binding
2024-02-26 11:07:33 -05:00
Andrei Betlen
8383a9e562
fix: llava this function takes at least 4 arguments (0 given)
2024-02-26 11:03:20 -05:00
Andrei Betlen
8e03fd9957
chore: Bump version
2024-02-25 21:15:42 -05:00
Andrei Betlen
dcf38f6141
fix: remove prematurely commited change
2024-02-25 21:00:37 -05:00
Andrei Betlen
cbbcd888af
feat: Update llama.cpp
2024-02-25 20:52:14 -05:00
Andrei Betlen
19234aa0db
fix: Restore type hints for low-level api
2024-02-25 16:54:37 -05:00
Andrei Betlen
2292af5796
feat: Update llama.cpp
2024-02-25 16:53:58 -05:00
Andrei Betlen
221edb9ef1
feat: Update llama.cpp
2024-02-24 23:47:29 -05:00
Andrei Betlen
20ea6fd7d6
chore: Bump version
2024-02-23 12:38:36 -05:00
Andrei Betlen
47bad30dd7
fix: LlamaHFTokenizer now receives pre_tokens
2024-02-23 12:23:24 -05:00
Andrei Betlen
ded5d627a5
chore: Bump version
2024-02-23 11:32:43 -05:00
Luke Stanley
858496224e
feat: Auto detect Mixtral's slightly different format ( #1214 )
2024-02-23 11:27:38 -05:00
Andrei Betlen
db776a885c
fix: module 'llama_cpp.llama_cpp' has no attribute 'c_uint8'
2024-02-23 11:24:53 -05:00
Andrei Betlen
427d816ebf
chore: Bump version
2024-02-23 04:54:08 -05:00
Alvaro Bartolome
251a8a2cad
feat: Add Google's Gemma formatting via chat_format="gemma"
( #1210 )
...
* Add Google's Gemma formatting via `chat_format="gemma"`
* Replace `raise ValueError` with `logger.debug`
Co-authored-by: Andrei <abetlen@gmail.com>
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-23 04:40:52 -05:00
Andrei Betlen
b9aca612af
misc: use typesafe byref for internal classes
2024-02-23 03:40:07 -05:00
Andrei Betlen
a0ce429dc0
misc: use decorator to bind low level api functions, fixes docs
2024-02-23 03:39:38 -05:00
Andrei Betlen
e10af30cf1
fix: TypeAlias import error
2024-02-22 03:27:28 -05:00
Andrei Betlen
3561ebf536
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-02-22 03:25:13 -05:00
Andrei Betlen
aefcb8f71a
misc: additional type annotations for low level api
2024-02-22 02:00:09 -05:00
Andrei Betlen
3921e10770
feat: support minItems/maxItems in JSON grammar converter (by @nopperl)
2024-02-22 00:17:06 -05:00
Andrei Betlen
e6d6260a91
fix: Update from_pretrained defaults to match hf_hub_download
2024-02-22 00:10:23 -05:00
Andrei Betlen
dd22010e85
fix: Raise exceptions when llama model or context fails to load
2024-02-22 00:09:45 -05:00
Andrei Betlen
3632241e98
chore: Bump version
2024-02-21 23:09:13 -05:00
Andrei Betlen
0653e15c20
feat: Update llama.cpp
2024-02-21 23:04:52 -05:00
Andrei Betlen
7981e9ce1e
chore: Bump version
2024-02-21 16:30:59 -05:00
Andrei
7f51b6071f
feat(low-level-api): Improve API static type-safety and performance ( #1205 )
2024-02-21 16:25:38 -05:00
Andrei
0f8aa4ab5c
feat: Pull models directly from huggingface ( #1206 )
...
* Add from_pretrained method to Llama class
* Update docs
* Merge filename and pattern
2024-02-21 16:25:10 -05:00
Andrei Betlen
e42f62c247
chore: Bump version
2024-02-21 11:09:40 -05:00
Andrei Betlen
4edde21b3d
feat: Update llama.cpp
2024-02-21 11:05:58 -05:00
Andrei Betlen
6225f027e5
feat: Update llama.cpp
2024-02-19 04:11:34 -05:00
Andrei Betlen
748c0ce057
feat: Update llama.cpp
2024-02-18 21:30:36 -05:00
Andrei Betlen
53f6f5f415
fix: self.numa missing
2024-02-17 01:02:33 -05:00
Andrei Betlen
fdce078cb9
feat: Update llama.cpp
2024-02-17 00:37:51 -05:00
Andrei Betlen
f736827b9b
chore: Bump version
2024-02-15 23:10:50 -05:00
Andrei Betlen
0ce66bc080
fix: create_embedding broken response for input type str
2024-02-15 16:09:48 -05:00
khimaros
ea1f88dd29
fix: Use '\n' seperator for EventSourceResponse ( #1188 )
...
this fixes compatibility with some OpenAI clients, including BetterChatGPT (https://github.com/ztjhz/BetterChatGPT/issues/537 ).
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-15 15:20:13 -05:00
Andrei Betlen
a5cfeb7763
feat: Update llama.cpp
2024-02-15 15:17:30 -05:00
Douglas Hanley
7bb91f025f
fix: Incorporate embedding pooling layer fixes ( #1194 )
...
* remove division by token count
* truncate to n_batch, not n_ctx
2024-02-15 15:16:30 -05:00
Andrei Betlen
ae71ad1a14
Bump version
2024-02-14 04:31:42 -05:00
Douglas Hanley
d7a67917ba
feat: Support batch embeddings ( #1186 )
...
* handle batched embeddings
* fix normalization issue
* fix type hints, ensure no breaking changes to embed
* Clear kv cache / reset internal state after embedding complete
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-14 04:26:09 -05:00
Andrei Betlen
7b9960d1cb
Update llama.cpp
2024-02-14 03:47:21 -05:00
Andrei Betlen
6943bab6d8
fix: destructor exception where internal classes are missing some uninitialized attributes
2024-02-14 03:38:41 -05:00
Andrei Betlen
07a783779a
fix: Update openbuddy prompt format. Closes #1155
2024-02-13 23:57:10 -05:00
Andrei Betlen
345215a76c
fix: more chatml-function-calling fixes
2024-02-13 23:02:50 -05:00
Andrei Betlen
b1637c2319
Bump version
2024-02-13 12:35:04 -05:00
Andrew Lapp
d6be5333e1
fix: sample idx off-by-one error for logit_processors ( #1179 )
...
* fix sample_idx off-by-one error
* self._scores is indexed differently, only modify the index within self._input_ids
---------
Co-authored-by: Andrew Lapp <andrew@rew.la>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-13 12:26:07 -05:00
Andrei Betlen
f7cdf78788
Update llama.cpp
2024-02-13 12:24:00 -05:00
Andrei Betlen
68fb71b6a2
fix: missing generation_prompt in chatml-function-calling
2024-02-13 03:24:41 -05:00
Andrei Betlen
4b0e3320bd
fix: minor formatting bugs for chatml-function-calling
2024-02-13 03:11:35 -05:00
Andrei Betlen
6fe8b427e1
Bump version
2024-02-13 02:46:52 -05:00