Andrei Betlen
7f52335c50
feat: Update llama.cpp
2024-04-25 21:21:29 -04:00
Andrei Betlen
266abfc1a3
fix(ci): Fix metal tests as well
2024-04-25 03:09:46 -04:00
Andrei Betlen
de37420fcf
fix(ci): Fix python macos test runners issue
2024-04-25 03:08:32 -04:00
Andrei Betlen
2a9979fce1
feat: Update llama.cpp
2024-04-25 02:48:26 -04:00
Andrei Betlen
c50d3300d2
chore: Bump version
2024-04-23 02:53:20 -04:00
Andrei Betlen
611781f531
ci: Build arm64 wheels. Closes #1342
2024-04-23 02:48:09 -04:00
Sean Bailey
53ebcc8bb5
feat(server): Provide ability to dynamically allocate all threads if desired using -1
( #1364 )
2024-04-23 02:35:38 -04:00
Geza Velkey
507c1da066
fix: Update scikit-build-core build dependency avoid bug in 0.9.1 ( #1370 )
...
cmake [options] <path-to-source>
cmake [options] <path-to-existing-build>
cmake [options] -S <path-to-source> -B <path-to-build>
Specify a source directory to (re-)generate a build system for it in the
current working directory. Specify an existing build directory to
re-generate its build system.
Run 'cmake --help' for more information. issue
2024-04-23 02:34:15 -04:00
abk16
8559e8ce88
feat: Add Llama-3 chat format ( #1371 )
...
* feat: Add Llama-3 chat format
* feat: Auto-detect Llama-3 chat format from gguf template
* feat: Update llama.cpp to b2715
Includes proper Llama-3 <|eot_id|> token handling.
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-23 02:33:29 -04:00
Andrei Betlen
617d536e1c
feat: Update llama.cpp
2024-04-23 02:31:40 -04:00
Andrei Betlen
d40a250ef3
feat: Use new llama_token_is_eog in create_completions
2024-04-22 00:35:47 -04:00
Andrei Betlen
b21ba0e2ac
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-04-21 20:46:42 -04:00
Andrei Betlen
159cc4e5d9
feat: Update llama.cpp
2024-04-21 20:46:40 -04:00
Andrei Betlen
0281214863
chore: Bump version
2024-04-20 00:09:37 -04:00
Andrei Betlen
cc81afebf0
feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct
2024-04-20 00:00:53 -04:00
Andrei Betlen
d17c1887a3
feat: Update llama.cpp
2024-04-19 23:58:16 -04:00
Andrei Betlen
893a27a736
chore: Bump version
2024-04-18 01:43:39 -04:00
Andrei Betlen
a128c80500
feat: Update llama.cpp
2024-04-18 01:39:45 -04:00
Lucca Zenóbio
4f42664955
feat: update grammar schema converter to match llama.cpp ( #1353 )
...
* feat: improve function calling
* feat:grammar
* fix
* fix
* fix
2024-04-18 01:36:25 -04:00
Andrei Betlen
fa4bb0cf81
Revert "feat: Update json to grammar ( #1350 )"
...
This reverts commit 610a592f70
.
2024-04-17 16:18:16 -04:00
Lucca Zenóbio
610a592f70
feat: Update json to grammar ( #1350 )
...
* feat: improve function calling
* feat:grammar
2024-04-17 10:10:21 -04:00
khimaros
b73c73c0c6
feat: add disable_ping_events
flag ( #1257 )
...
for backward compatibility, this is false by default
it can be set to true to disable EventSource pings
which are not supported by some OpenAI clients.
fixes https://github.com/abetlen/llama-cpp-python/issues/1256
2024-04-17 10:08:19 -04:00
tc-wolf
4924455dec
feat: Make saved state more compact on-disk ( #1296 )
...
* State load/save changes
- Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)`
sized array.
- Difference between ~350MB and ~1500MB for example prompt with ~300
tokens (makes sense lol)
- Auto-formatting changes
* Back out formatting changes
2024-04-17 10:06:50 -04:00
Andrei Betlen
9842cbf99d
feat: Update llama.cpp
2024-04-17 10:06:15 -04:00
ddh0
c96b2daebf
feat: Use all available CPUs for batch processing ( #1345 )
2024-04-17 10:05:54 -04:00
Andrei Betlen
a420f9608b
feat: Update llama.cpp
2024-04-14 19:14:09 -04:00
Andrei Betlen
90dceaba8a
feat: Update llama.cpp
2024-04-14 11:35:57 -04:00
Andrei Betlen
2e9ffd28fd
feat: Update llama.cpp
2024-04-12 21:09:12 -04:00
Andrei Betlen
ef29235d45
chore: Bump version
2024-04-10 03:44:46 -04:00
Andrei Betlen
bb65b4d764
fix: pass correct type to chat handlers for chat completion logprobs
2024-04-10 03:41:55 -04:00
Andrei Betlen
060bfa64d5
feat: Add support for yaml based configs
2024-04-10 02:47:01 -04:00
Andrei Betlen
1347e1d050
feat: Add typechecking for ctypes structure attributes
2024-04-10 02:40:41 -04:00
Andrei Betlen
889d0e8981
feat: Update llama.cpp
2024-04-10 02:25:58 -04:00
Andrei Betlen
56071c956a
feat: Update llama.cpp
2024-04-09 09:53:49 -04:00
Andrei Betlen
08b16afe11
chore: Bump version
2024-04-06 01:53:38 -04:00
Andrei Betlen
7ca364c8bd
feat: Update llama.cpp
2024-04-06 01:37:43 -04:00
Andrei Betlen
b3bfea6dbf
fix: Always embed metal library. Closes #1332
2024-04-06 01:36:53 -04:00
Andrei Betlen
f4092e6b46
feat: Update llama.cpp
2024-04-05 10:59:31 -04:00
Andrei Betlen
2760ef6156
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-04-05 10:51:54 -04:00
Andrei Betlen
1ae3abbcc3
fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314
2024-04-05 10:51:44 -04:00
Andrei Betlen
49bc66bfa2
fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314
2024-04-05 10:50:49 -04:00
Andrei Betlen
9111b6e03a
feat: Update llama.cpp
2024-04-05 09:21:02 -04:00
Sigbjørn Skjæret
7265a5dc0e
fix(docs): incorrect tool_choice example ( #1330 )
2024-04-05 09:14:03 -04:00
Andrei Betlen
909ef66951
docs: Rename cuBLAS section to CUDA
2024-04-04 03:08:47 -04:00
Andrei Betlen
1db3b58fdc
docs: Add docs explaining how to install pre-built wheels.
2024-04-04 02:57:06 -04:00
Andrei Betlen
c50309e52a
docs: LLAMA_CUBLAS -> LLAMA_CUDA
2024-04-04 02:49:19 -04:00
Andrei Betlen
612e78d322
fix(ci): use correct script name
2024-04-03 16:15:29 -04:00
Andrei Betlen
34081ddc5b
chore: Bump version
2024-04-03 15:38:27 -04:00
Andrei Betlen
368061c04a
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main
2024-04-03 15:35:30 -04:00
Andrei Betlen
5a5193636b
feat: Update llama.cpp
2024-04-03 15:35:28 -04:00