This website requires JavaScript.
946156fb6c
feat: Update llama.cpp
Andrei Betlen
2024-04-30 15:46:45 -0400
9286b5caac
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
Andrei Betlen
2024-04-30 15:45:36 -0400
f116175a5a
fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729
Andrei Betlen
2024-04-30 15:45:34 -0400
3226b3c5ef
fix: UTF-8 handling with grammars (#1415 )
Jonathan Soma
2024-04-30 14:33:23 -0400
945c62c567
docs: Change all examples from interpreter style to script style.
Andrei Betlen
2024-04-30 10:15:04 -0400
26478ab293
docs: Update README.md
Andrei Betlen
2024-04-30 10:11:38 -0400
b14dd98922
chore: Bump version
Andrei Betlen
2024-04-30 09:39:56 -0400
29b6e9a5c8
fix: wrong parameter for flash attention in pickle __getstate__
Andrei Betlen
2024-04-30 09:32:47 -0400
22d77eefd2
feat: Add option to enable flash_attn
to Lllama params and ModelSettings
Andrei Betlen
2024-04-30 09:29:16 -0400
8c2b24d5aa
feat: Update llama.cpp
Andrei Betlen
2024-04-30 09:27:55 -0400
6332527a69
fix(ci): Fix build-and-release.yaml (#1413 )
Olivier DEBAUCHE
2024-04-30 15:16:14 +0200
c8cd8c17c6
docs: Update README to include CUDA 12.4 wheels
Andrei Betlen
2024-04-30 03:12:46 -0400
f417cce28a
chore: Bump version
Andrei Betlen
2024-04-30 03:11:02 -0400
3489ef09d3
fix: Ensure image renders before text in chat formats regardless of message content order.
Andrei Betlen
2024-04-30 03:08:46 -0400
d03f15bb73
fix(ci): Fix bug in use of upload-artifact failing to merge multiple artifacts into a single release.
Andrei Betlen
2024-04-30 02:58:55 -0400
26c7876ba0
chore: Bump version
Andrei Betlen
2024-04-30 01:48:40 -0400
fe2da09538
feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147 )
Andrei
2024-04-30 01:35:38 -0400
97fb860eba
feat: Update llama.cpp
Andrei Betlen
2024-04-29 23:34:55 -0400
df2b5b5d44
chore(deps): bump actions/upload-artifact from 3 to 4 (#1412 )
dependabot[bot]
2024-04-29 22:53:42 -0400
be43018e09
chore(deps): bump actions/configure-pages from 4 to 5 (#1411 )
dependabot[bot]
2024-04-29 22:53:21 -0400
32c000f3ec
chore(deps): bump softprops/action-gh-release from 1 to 2 (#1408 )
dependabot[bot]
2024-04-29 22:52:58 -0400
03c654a3d9
ci(fix): Workflow actions updates and fix arm64 wheels not included in release (#1392 )
Olivier DEBAUCHE
2024-04-30 04:52:23 +0200
0c3bc4b928
fix(ci): Update generate wheel index script to include cu12.3 and cu12.4 Closes #1406
Andrei Betlen
2024-04-29 12:37:22 -0400
2355ce2227
ci: Add support for pre-built cuda 12.4.1 wheels (#1388 )
Olivier DEBAUCHE
2024-04-28 05:44:47 +0200
a411612b38
feat: Add support for str type kv_overrides
Andrei Betlen
2024-04-27 23:42:19 -0400
c9b85bf098
feat: Update llama.cpp
Andrei Betlen
2024-04-27 23:41:54 -0400
c07db99e5b
chore(deps): bump pypa/cibuildwheel from 2.16.5 to 2.17.0 (#1401 )
dependabot[bot]
2024-04-27 20:51:13 -0400
7074c4d256
chore(deps): bump docker/build-push-action from 4 to 5 (#1400 )
dependabot[bot]
2024-04-27 20:51:02 -0400
79318ba1d1
chore(deps): bump docker/login-action from 2 to 3 (#1399 )
dependabot[bot]
2024-04-27 20:50:50 -0400
27038db3d6
chore(deps): bump actions/cache from 3.3.2 to 4.0.2 (#1398 )
dependabot[bot]
2024-04-27 20:50:39 -0400
17bdfc818f
chore(deps): bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.4 (#1397 )
dependabot[bot]
2024-04-27 20:50:28 -0400
f178636e1b
fix: Functionary bug fixes (#1385 )
Jeffrey Fong
2024-04-28 08:49:52 +0800
e6bbfb863c
examples: fix quantize example (#1387 )
iyubondyrev
2024-04-28 02:48:47 +0200
c58b56123d
ci: Update action versions in build-wheels-metal.yaml (#1390 )
Olivier DEBAUCHE
2024-04-28 02:47:49 +0200
9e7f738220
ci: Update dependabot.yml (#1391 )
Olivier DEBAUCHE
2024-04-28 02:47:07 +0200
65edc90671
chore: Bump version
Andrei Betlen
2024-04-26 10:11:31 -0400
173ebc7878
fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings
Andrei Betlen
2024-04-25 21:36:09 -0400
f6ed21f9a2
feat: Allow for possibly non-pooled embeddings (#1380 )
Douglas Hanley
2024-04-25 20:32:44 -0500
fcfea66857
fix: pydantic deprecation warning
Andrei Betlen
2024-04-25 21:21:48 -0400
7f52335c50
feat: Update llama.cpp
Andrei Betlen
2024-04-25 21:21:29 -0400
266abfc1a3
fix(ci): Fix metal tests as well
Andrei Betlen
2024-04-25 03:09:46 -0400
de37420fcf
fix(ci): Fix python macos test runners issue
Andrei Betlen
2024-04-25 03:08:32 -0400
2a9979fce1
feat: Update llama.cpp
Andrei Betlen
2024-04-25 02:48:26 -0400
ce85be97e2
Merge https://github.com/abetlen/llama-cpp-python
baalajimaestro
2024-04-25 10:48:33 +0530
c50d3300d2
chore: Bump version
Andrei Betlen
2024-04-23 02:53:20 -0400
611781f531
ci: Build arm64 wheels. Closes #1342
Andrei Betlen
2024-04-23 02:48:09 -0400
53ebcc8bb5
feat(server): Provide ability to dynamically allocate all threads if desired using -1
(#1364 )
Sean Bailey
2024-04-23 02:35:38 -0400
507c1da066
fix: Update scikit-build-core build dependency avoid bug in 0.9.1 (#1370 )
Geza Velkey
2024-04-23 08:34:15 +0200
8559e8ce88
feat: Add Llama-3 chat format (#1371 )
abk16
2024-04-23 06:33:29 +0000
617d536e1c
feat: Update llama.cpp
Andrei Betlen
2024-04-23 02:31:40 -0400
d40a250ef3
feat: Use new llama_token_is_eog in create_completions
Andrei Betlen
2024-04-22 00:35:47 -0400
b21ba0e2ac
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
Andrei Betlen
2024-04-21 20:46:42 -0400
159cc4e5d9
feat: Update llama.cpp
Andrei Betlen
2024-04-21 20:46:40 -0400
0281214863
chore: Bump version
Andrei Betlen
2024-04-20 00:09:37 -0400
cc81afebf0
feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct
Andrei Betlen
2024-04-20 00:00:53 -0400
d17c1887a3
feat: Update llama.cpp
Andrei Betlen
2024-04-19 23:58:16 -0400
893a27a736
chore: Bump version
Andrei Betlen
2024-04-18 01:43:39 -0400
a128c80500
feat: Update llama.cpp
Andrei Betlen
2024-04-18 01:39:45 -0400
4f42664955
feat: update grammar schema converter to match llama.cpp (#1353 )
Lucca Zenóbio
2024-04-18 02:36:25 -0300
fa4bb0cf81
Revert "feat: Update json to grammar (#1350 )"
Andrei Betlen
2024-04-17 16:18:16 -0400
610a592f70
feat: Update json to grammar (#1350 )
Lucca Zenóbio
2024-04-17 11:10:21 -0300
b73c73c0c6
feat: add disable_ping_events
flag (#1257 )
khimaros
2024-04-17 14:08:19 +0000
4924455dec
feat: Make saved state more compact on-disk (#1296 )
tc-wolf
2024-04-17 09:06:50 -0500
9842cbf99d
feat: Update llama.cpp
Andrei Betlen
2024-04-17 10:06:15 -0400
c96b2daebf
feat: Use all available CPUs for batch processing (#1345 )
ddh0
2024-04-17 09:04:33 -0500
a420f9608b
feat: Update llama.cpp
Andrei Betlen
2024-04-14 19:14:09 -0400
90dceaba8a
feat: Update llama.cpp
Andrei Betlen
2024-04-14 11:35:57 -0400
2e9ffd28fd
feat: Update llama.cpp
Andrei Betlen
2024-04-12 21:09:12 -0400
ef29235d45
chore: Bump version
Andrei Betlen
2024-04-10 03:44:46 -0400
bb65b4d764
fix: pass correct type to chat handlers for chat completion logprobs
Andrei Betlen
2024-04-10 03:41:55 -0400
060bfa64d5
feat: Add support for yaml based configs
Andrei Betlen
2024-04-10 02:47:01 -0400
1347e1d050
feat: Add typechecking for ctypes structure attributes
Andrei Betlen
2024-04-10 02:40:41 -0400
889d0e8981
feat: Update llama.cpp
Andrei Betlen
2024-04-10 02:25:58 -0400
56071c956a
feat: Update llama.cpp
Andrei Betlen
2024-04-09 09:53:49 -0400
0078e0f1cf
Merge https://github.com/abetlen/llama-cpp-python
baalajimaestro
2024-04-06 16:34:43 +0530
08b16afe11
chore: Bump version
Andrei Betlen
2024-04-06 01:53:38 -0400
7ca364c8bd
feat: Update llama.cpp
Andrei Betlen
2024-04-06 01:37:43 -0400
b3bfea6dbf
fix: Always embed metal library. Closes #1332
Andrei Betlen
2024-04-06 01:36:53 -0400
f4092e6b46
feat: Update llama.cpp
Andrei Betlen
2024-04-05 10:59:31 -0400
2760ef6156
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
Andrei Betlen
2024-04-05 10:51:54 -0400
1ae3abbcc3
fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314
Andrei Betlen
2024-04-05 10:50:49 -0400
49bc66bfa2
fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314
Andrei Betlen
2024-04-05 10:50:49 -0400
9111b6e03a
feat: Update llama.cpp
Andrei Betlen
2024-04-05 09:21:02 -0400
7265a5dc0e
fix(docs): incorrect tool_choice example (#1330 )
Sigbjørn Skjæret
2024-04-05 15:14:03 +0200
8b9cd38c0d
Merge https://github.com/abetlen/llama-cpp-python
baalajimaestro
2024-04-05 10:38:53 +0530
909ef66951
docs: Rename cuBLAS section to CUDA
Andrei Betlen
2024-04-04 03:08:47 -0400
1db3b58fdc
docs: Add docs explaining how to install pre-built wheels.
Andrei Betlen
2024-04-04 02:57:06 -0400
c50309e52a
docs: LLAMA_CUBLAS -> LLAMA_CUDA
Andrei Betlen
2024-04-04 02:49:19 -0400
612e78d322
fix(ci): use correct script name
Andrei Betlen
2024-04-03 16:15:29 -0400
34081ddc5b
chore: Bump version
Andrei Betlen
2024-04-03 15:38:27 -0400
368061c04a
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
Andrei Betlen
2024-04-03 15:35:30 -0400
5a5193636b
feat: Update llama.cpp
Andrei Betlen
2024-04-03 15:35:28 -0400
5a930ee9a1
feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247 )
Andrei
2024-04-03 15:32:13 -0400
8649d7671b
fix: segfault when logits_all=False. Closes #1319
Andrei Betlen
2024-04-03 15:30:31 -0400
f96de6d920
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
Andrei Betlen
2024-04-03 00:55:21 -0400
e465157804
feat: Update llama.cpp
Andrei Betlen
2024-04-03 00:55:19 -0400
62aad610e1
fix: last tokens passing to sample_repetition_penalties function (#1295 )
Yuri Mikhailov
2024-04-02 04:25:43 +0900
45bf5ae582
chore: Bump version
Andrei Betlen
2024-04-01 10:28:22 -0400
a0f373e310
fix: Changed local API doc references to hosted (#1317 )
lawfordp2017
2024-04-01 08:21:00 -0600
f165048a69
feat: add support for KV cache quantization options (#1307 )
Limour
2024-04-01 22:19:28 +0800