llama.cpp

baalajimaestro/llama.cpp

Fork 0

Commit graph

946156fb6c feat: Update llama.cpp Andrei Betlen 2024-04-30 15:46:45 -0400
9286b5caac Merge branch 'main' of github.com:abetlen/llama_cpp_python into main Andrei Betlen 2024-04-30 15:45:36 -0400
f116175a5a fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729 Andrei Betlen 2024-04-30 15:45:34 -0400
3226b3c5ef

fix: UTF-8 handling with grammars (#1415) Jonathan Soma 2024-04-30 14:33:23 -0400
945c62c567 docs: Change all examples from interpreter style to script style. Andrei Betlen 2024-04-30 10:15:04 -0400
26478ab293 docs: Update README.md Andrei Betlen 2024-04-30 10:11:38 -0400
b14dd98922 chore: Bump version Andrei Betlen 2024-04-30 09:39:56 -0400
29b6e9a5c8 fix: wrong parameter for flash attention in pickle __getstate__ Andrei Betlen 2024-04-30 09:32:47 -0400
22d77eefd2 feat: Add option to enable flash_attn to Lllama params and ModelSettings Andrei Betlen 2024-04-30 09:29:16 -0400
8c2b24d5aa feat: Update llama.cpp Andrei Betlen 2024-04-30 09:27:55 -0400
6332527a69

fix(ci): Fix build-and-release.yaml (#1413) Olivier DEBAUCHE 2024-04-30 15:16:14 +0200
c8cd8c17c6 docs: Update README to include CUDA 12.4 wheels Andrei Betlen 2024-04-30 03:12:46 -0400
f417cce28a chore: Bump version Andrei Betlen 2024-04-30 03:11:02 -0400
3489ef09d3 fix: Ensure image renders before text in chat formats regardless of message content order. Andrei Betlen 2024-04-30 03:08:46 -0400
d03f15bb73 fix(ci): Fix bug in use of upload-artifact failing to merge multiple artifacts into a single release. Andrei Betlen 2024-04-30 02:58:55 -0400
26c7876ba0 chore: Bump version Andrei Betlen 2024-04-30 01:48:40 -0400
fe2da09538

feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147) Andrei 2024-04-30 01:35:38 -0400
97fb860eba feat: Update llama.cpp Andrei Betlen 2024-04-29 23:34:55 -0400
df2b5b5d44

chore(deps): bump actions/upload-artifact from 3 to 4 (#1412) dependabot[bot] 2024-04-29 22:53:42 -0400
be43018e09

chore(deps): bump actions/configure-pages from 4 to 5 (#1411) dependabot[bot] 2024-04-29 22:53:21 -0400
32c000f3ec

chore(deps): bump softprops/action-gh-release from 1 to 2 (#1408) dependabot[bot] 2024-04-29 22:52:58 -0400
03c654a3d9

ci(fix): Workflow actions updates and fix arm64 wheels not included in release (#1392) Olivier DEBAUCHE 2024-04-30 04:52:23 +0200
0c3bc4b928 fix(ci): Update generate wheel index script to include cu12.3 and cu12.4 Closes #1406 Andrei Betlen 2024-04-29 12:37:22 -0400
2355ce2227

ci: Add support for pre-built cuda 12.4.1 wheels (#1388) Olivier DEBAUCHE 2024-04-28 05:44:47 +0200
a411612b38 feat: Add support for str type kv_overrides Andrei Betlen 2024-04-27 23:42:19 -0400
c9b85bf098 feat: Update llama.cpp Andrei Betlen 2024-04-27 23:41:54 -0400
c07db99e5b

chore(deps): bump pypa/cibuildwheel from 2.16.5 to 2.17.0 (#1401) dependabot[bot] 2024-04-27 20:51:13 -0400
7074c4d256

chore(deps): bump docker/build-push-action from 4 to 5 (#1400) dependabot[bot] 2024-04-27 20:51:02 -0400
79318ba1d1

chore(deps): bump docker/login-action from 2 to 3 (#1399) dependabot[bot] 2024-04-27 20:50:50 -0400
27038db3d6

chore(deps): bump actions/cache from 3.3.2 to 4.0.2 (#1398) dependabot[bot] 2024-04-27 20:50:39 -0400
17bdfc818f

chore(deps): bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.4 (#1397) dependabot[bot] 2024-04-27 20:50:28 -0400
f178636e1b

fix: Functionary bug fixes (#1385) Jeffrey Fong 2024-04-28 08:49:52 +0800
e6bbfb863c

examples: fix quantize example (#1387) iyubondyrev 2024-04-28 02:48:47 +0200
c58b56123d

ci: Update action versions in build-wheels-metal.yaml (#1390) Olivier DEBAUCHE 2024-04-28 02:47:49 +0200
9e7f738220

ci: Update dependabot.yml (#1391) Olivier DEBAUCHE 2024-04-28 02:47:07 +0200
65edc90671 chore: Bump version Andrei Betlen 2024-04-26 10:11:31 -0400
173ebc7878 fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings Andrei Betlen 2024-04-25 21:36:09 -0400
f6ed21f9a2

feat: Allow for possibly non-pooled embeddings (#1380) Douglas Hanley 2024-04-25 20:32:44 -0500
fcfea66857 fix: pydantic deprecation warning Andrei Betlen 2024-04-25 21:21:48 -0400
7f52335c50 feat: Update llama.cpp Andrei Betlen 2024-04-25 21:21:29 -0400
266abfc1a3 fix(ci): Fix metal tests as well Andrei Betlen 2024-04-25 03:09:46 -0400
de37420fcf fix(ci): Fix python macos test runners issue Andrei Betlen 2024-04-25 03:08:32 -0400
2a9979fce1 feat: Update llama.cpp Andrei Betlen 2024-04-25 02:48:26 -0400
ce85be97e2

Merge https://github.com/abetlen/llama-cpp-python baalajimaestro 2024-04-25 10:48:33 +0530
c50d3300d2 chore: Bump version Andrei Betlen 2024-04-23 02:53:20 -0400
611781f531 ci: Build arm64 wheels. Closes #1342 Andrei Betlen 2024-04-23 02:48:09 -0400
53ebcc8bb5

feat(server): Provide ability to dynamically allocate all threads if desired using -1 (#1364) Sean Bailey 2024-04-23 02:35:38 -0400
507c1da066

fix: Update scikit-build-core build dependency avoid bug in 0.9.1 (#1370) Geza Velkey 2024-04-23 08:34:15 +0200
8559e8ce88

feat: Add Llama-3 chat format (#1371) abk16 2024-04-23 06:33:29 +0000
617d536e1c feat: Update llama.cpp Andrei Betlen 2024-04-23 02:31:40 -0400
d40a250ef3 feat: Use new llama_token_is_eog in create_completions Andrei Betlen 2024-04-22 00:35:47 -0400
b21ba0e2ac Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-21 20:46:42 -0400
159cc4e5d9 feat: Update llama.cpp Andrei Betlen 2024-04-21 20:46:40 -0400
0281214863 chore: Bump version Andrei Betlen 2024-04-20 00:09:37 -0400
cc81afebf0 feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct Andrei Betlen 2024-04-20 00:00:53 -0400
d17c1887a3 feat: Update llama.cpp Andrei Betlen 2024-04-19 23:58:16 -0400
893a27a736 chore: Bump version Andrei Betlen 2024-04-18 01:43:39 -0400
a128c80500 feat: Update llama.cpp Andrei Betlen 2024-04-18 01:39:45 -0400
4f42664955

feat: update grammar schema converter to match llama.cpp (#1353) Lucca Zenóbio 2024-04-18 02:36:25 -0300
fa4bb0cf81 Revert "feat: Update json to grammar (#1350)" Andrei Betlen 2024-04-17 16:18:16 -0400
610a592f70

feat: Update json to grammar (#1350) Lucca Zenóbio 2024-04-17 11:10:21 -0300
b73c73c0c6

feat: add disable_ping_events flag (#1257) khimaros 2024-04-17 14:08:19 +0000
4924455dec

feat: Make saved state more compact on-disk (#1296) tc-wolf 2024-04-17 09:06:50 -0500
9842cbf99d feat: Update llama.cpp Andrei Betlen 2024-04-17 10:06:15 -0400
c96b2daebf feat: Use all available CPUs for batch processing (#1345) ddh0 2024-04-17 09:04:33 -0500
a420f9608b feat: Update llama.cpp Andrei Betlen 2024-04-14 19:14:09 -0400
90dceaba8a feat: Update llama.cpp Andrei Betlen 2024-04-14 11:35:57 -0400
2e9ffd28fd feat: Update llama.cpp Andrei Betlen 2024-04-12 21:09:12 -0400
ef29235d45 chore: Bump version Andrei Betlen 2024-04-10 03:44:46 -0400
bb65b4d764 fix: pass correct type to chat handlers for chat completion logprobs Andrei Betlen 2024-04-10 03:41:55 -0400
060bfa64d5 feat: Add support for yaml based configs Andrei Betlen 2024-04-10 02:47:01 -0400
1347e1d050 feat: Add typechecking for ctypes structure attributes Andrei Betlen 2024-04-10 02:40:41 -0400
889d0e8981 feat: Update llama.cpp Andrei Betlen 2024-04-10 02:25:58 -0400
56071c956a feat: Update llama.cpp Andrei Betlen 2024-04-09 09:53:49 -0400
0078e0f1cf

Merge https://github.com/abetlen/llama-cpp-python baalajimaestro 2024-04-06 16:34:43 +0530
08b16afe11 chore: Bump version Andrei Betlen 2024-04-06 01:53:38 -0400
7ca364c8bd feat: Update llama.cpp Andrei Betlen 2024-04-06 01:37:43 -0400
b3bfea6dbf fix: Always embed metal library. Closes #1332 Andrei Betlen 2024-04-06 01:36:53 -0400
f4092e6b46 feat: Update llama.cpp Andrei Betlen 2024-04-05 10:59:31 -0400
2760ef6156 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-05 10:51:54 -0400
1ae3abbcc3 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314 Andrei Betlen 2024-04-05 10:50:49 -0400
49bc66bfa2 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314 Andrei Betlen 2024-04-05 10:50:49 -0400
9111b6e03a feat: Update llama.cpp Andrei Betlen 2024-04-05 09:21:02 -0400
7265a5dc0e

fix(docs): incorrect tool_choice example (#1330) Sigbjørn Skjæret 2024-04-05 15:14:03 +0200
8b9cd38c0d

Merge https://github.com/abetlen/llama-cpp-python baalajimaestro 2024-04-05 10:38:53 +0530
909ef66951 docs: Rename cuBLAS section to CUDA Andrei Betlen 2024-04-04 03:08:47 -0400
1db3b58fdc docs: Add docs explaining how to install pre-built wheels. Andrei Betlen 2024-04-04 02:57:06 -0400
c50309e52a docs: LLAMA_CUBLAS -> LLAMA_CUDA Andrei Betlen 2024-04-04 02:49:19 -0400
612e78d322 fix(ci): use correct script name Andrei Betlen 2024-04-03 16:15:29 -0400
34081ddc5b chore: Bump version Andrei Betlen 2024-04-03 15:38:27 -0400
368061c04a Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-03 15:35:30 -0400
5a5193636b feat: Update llama.cpp Andrei Betlen 2024-04-03 15:35:28 -0400
5a930ee9a1

feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247) Andrei 2024-04-03 15:32:13 -0400
8649d7671b fix: segfault when logits_all=False. Closes #1319 Andrei Betlen 2024-04-03 15:30:31 -0400
f96de6d920 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-03 00:55:21 -0400
e465157804 feat: Update llama.cpp Andrei Betlen 2024-04-03 00:55:19 -0400
62aad610e1

fix: last tokens passing to sample_repetition_penalties function (#1295) Yuri Mikhailov 2024-04-02 04:25:43 +0900
45bf5ae582 chore: Bump version Andrei Betlen 2024-04-01 10:28:22 -0400
a0f373e310

fix: Changed local API doc references to hosted (#1317) lawfordp2017 2024-04-01 08:21:00 -0600
f165048a69

feat: add support for KV cache quantization options (#1307) Limour 2024-04-01 22:19:28 +0800