Andrei Betlen
909ef66951
docs: Rename cuBLAS section to CUDA
2024-04-04 03:08:47 -04:00
Andrei Betlen
1db3b58fdc
docs: Add docs explaining how to install pre-built wheels.
2024-04-04 02:57:06 -04:00
Andrei Betlen
c50309e52a
docs: LLAMA_CUBLAS -> LLAMA_CUDA
2024-04-04 02:49:19 -04:00
Andrei Betlen
612e78d322
fix(ci): use correct script name
2024-04-03 16:15:29 -04:00
Andrei Betlen
34081ddc5b
chore: Bump version
2024-04-03 15:38:27 -04:00
Andrei Betlen
368061c04a
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-04-03 15:35:30 -04:00
Andrei Betlen
5a5193636b
feat: Update llama.cpp
2024-04-03 15:35:28 -04:00
Andrei
5a930ee9a1
feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal ( #1247 )
...
* Generate binary wheel index on release
* Add total release downloads badge
* Update download label
* Use official cibuildwheel action
* Add workflows to build CUDA and Metal wheels
* Update generate index workflow
* Update workflow name
2024-04-03 15:32:13 -04:00
Andrei Betlen
8649d7671b
fix: segfault when logits_all=False. Closes #1319
2024-04-03 15:30:31 -04:00
Andrei Betlen
f96de6d920
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-04-03 00:55:21 -04:00
Andrei Betlen
e465157804
feat: Update llama.cpp
2024-04-03 00:55:19 -04:00
Yuri Mikhailov
62aad610e1
fix: last tokens passing to sample_repetition_penalties function ( #1295 )
...
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-01 15:25:43 -04:00
Andrei Betlen
45bf5ae582
chore: Bump version
2024-04-01 10:28:22 -04:00
lawfordp2017
a0f373e310
fix: Changed local API doc references to hosted ( #1317 )
2024-04-01 10:21:00 -04:00
Limour
f165048a69
feat: add support for KV cache quantization options ( #1307 )
...
* add KV cache quantization options
https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305
* Add ggml_type
* Use ggml_type instead of string for quantization
* Add server support
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-01 10:19:28 -04:00
windspirit95
aa9f1ae011
feat: Add logprobs support to chat completions ( #1311 )
...
* Add logprobs return in ChatCompletionResponse
* Fix duplicate field
* Set default to false
* Simplify check
* Add server example
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-31 13:30:13 -04:00
Andrei Betlen
1e60dba082
feat: Update llama.cpp
2024-03-29 13:34:23 -04:00
Andrei Betlen
dcbe57fcf8
feat: Update llama.cpp
2024-03-29 12:45:27 -04:00
Andrei Betlen
125b2358c9
feat: Update llama.cpp
2024-03-28 12:06:46 -04:00
Andrei Betlen
901fe02461
feat: Update llama.cpp
2024-03-26 22:58:53 -04:00
Andrei Betlen
b64fa4e2c0
feat: Update llama.cpp
2024-03-25 23:09:07 -04:00
Andrei Betlen
a93b9149f8
feat: Update llama.cpp
2024-03-25 11:10:14 -04:00
Andrei Betlen
364678bde5
feat: Update llama.cpp
2024-03-24 12:27:49 -04:00
Andrei Betlen
d11ccc3036
fix(server): minor type fixes
2024-03-23 17:14:15 -04:00
Andrei Betlen
c1325dcdfb
fix: tool_call missing first token.
2024-03-22 23:44:04 -04:00
Andrei Betlen
e325a831f0
feat: Update llama.cpp
2024-03-22 23:43:29 -04:00
Andrei Betlen
c89be28ef9
feat: Update llama.cpp
2024-03-20 20:50:47 -04:00
Andrei Betlen
3db03b7302
feat: Update llama.cpp
2024-03-20 13:27:43 -04:00
bretello
740f3f3812
fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 ( #1289 )
2024-03-20 12:46:09 -04:00
Andrei Betlen
f7decc9562
docs: Add chat examples to openapi ui
2024-03-19 10:52:53 -04:00
Andrei
60d8498f21
feat: Add tools/functions variables to Jinja2ChatFormatter, add function response formatting for all simple chat formats ( #1273 )
...
* Add tools/functions variables to Jinja2ChatFormatter
Also fixed missing tools/tool_choices parameters in chat_formatter_to_chat_completion_handler().
* Set grammar when doing explicit function calling
* Add function / tool response for all chat formats
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2024-03-19 04:55:57 -04:00
Andrei Betlen
18d7ce918f
feat: Update llama.cpp
2024-03-19 04:40:24 -04:00
Andrei Betlen
7d4a5ec59f
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-03-18 11:37:33 -04:00
Andrei Betlen
bf64752535
chore: Bump version
2024-03-18 11:37:30 -04:00
Jeffrey Fong
8a60c7bc8c
fix: Fix and optimize functionary chat handler ( #1282 )
...
* fix functionary chat logic
* further fixes
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-18 10:40:57 -04:00
Andrei Betlen
8d298b4750
feat: Update llama.cpp
2024-03-18 10:26:36 -04:00
Andrei Betlen
6eb25231e4
feat: Update llama.cpp
2024-03-15 12:58:45 -04:00
Andrei Betlen
20e6815252
fix: json mode
2024-03-15 12:58:34 -04:00
Andrei Betlen
1a9b8af2dd
feat: Update llama.cpp
2024-03-14 11:46:48 -04:00
Andrei Betlen
4084aabe86
fix: set default pooling type to unspecified
2024-03-14 10:04:57 -04:00
Andrei Betlen
d318cc8b83
fix: Set default pooling_type to mean, check for null pointer.
2024-03-14 09:17:41 -04:00
Andrei Betlen
dd0ee56217
feat: Update llama.cpp
2024-03-13 15:57:35 -04:00
Andrei Betlen
08e910f7a7
feat: Update llama.cpp
2024-03-10 23:45:05 -04:00
Andrei Betlen
a7281994d8
chore: Bump version
2024-03-08 21:14:44 -05:00
Andrei Betlen
919fca9f2b
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-03-08 21:10:56 -05:00
Andrei Betlen
d02a9cf16f
Fixed json strings grammar by blacklisting character control set. Closes #1259
2024-03-08 21:10:53 -05:00
Felipe Lorenz
c139f8b5d5
feat: Add endpoints for tokenize, detokenize and count tokens ( #1136 )
...
* Add endpoint to count tokens
* Add tokenize and detokenize endpoints
* Change response key to tokens for tokenize endpoint
* Fix dependency bug
* Cleanup
* Remove example added by mistake
* Move tokenize, detokenize, and count to Extras namespace. Tag existing endpoints
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-08 21:09:00 -05:00
Kevin Cao
1f3156d4f2
fix: Check for existence of clip model path ( #1264 )
2024-03-08 21:00:10 -05:00
Douglas Hanley
2811014bae
feat: Switch embed to llama_get_embeddings_seq ( #1263 )
...
* switch to llama_get_embeddings_seq
* Remove duplicate definition of llama_get_embeddings_seq
Co-authored-by: Andrei <abetlen@gmail.com>
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-08 20:59:35 -05:00
Andrei Betlen
40c6b54f68
feat: Update llama.cpp
2024-03-08 20:58:50 -05:00