d296cf3e06
Merge https://github.com/abetlen/llama-cpp-python
2024-01-05 09:39:18 +05:30
Andrei Betlen
75d0527fd7
Bump version
2024-01-04 18:30:12 -05:00
Andrei Betlen
fffcd0181c
Update llama.cpp
2024-01-04 18:26:00 -05:00
Fedor Moiseev
907b9e9d42
Add Saiga chat format. ( #1050 )
2024-01-04 18:12:58 -05:00
Caleb Hoff
f766b70c9a
Fix: Correct typo in README.md ( #1058 )
...
In Llama.create_chat_completion, the `tool_choice` property does not have an s on the end.
2024-01-04 18:12:32 -05:00
xaviviro
cf743ec5d3
Added ChatGLM chat format ( #1059 )
...
Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>
2024-01-04 18:12:02 -05:00
Andrei Betlen
eb9c7d4ed8
Update llama.cpp
2024-01-03 22:04:04 -05:00
46987d2d54
Merge https://github.com/abetlen/llama-cpp-python
2023-12-29 17:45:38 +05:30
Andrei Betlen
011c3630f5
Bump version
2023-12-27 17:35:02 -05:00
Andrei Betlen
969ea6a2c0
Update llama.cpp
2023-12-27 17:33:26 -05:00
dca1ed6fb6
Use proper image for intel python
...
Signed-off-by: baalajimaestro <me@baalajimaestro.me>
2023-12-25 11:41:29 +05:30
Andrei Betlen
f952d45c2c
Update llama.cpp
2023-12-24 01:34:36 -05:00
Andrei Betlen
f6f157c06d
Update bug report instructions for new build process.
2023-12-22 15:35:51 -05:00
Andrei Betlen
92284f32cb
Add HIP_PATH to dll search directories for windows users.
2023-12-22 15:29:56 -05:00
Andrei Betlen
2b0d3f36fa
set llama_max_devices using library function
2023-12-22 15:19:28 -05:00
Andrei Betlen
d9a1d90fd7
Fix typo
2023-12-22 15:12:27 -05:00
Andrei Betlen
37556bf9c4
Bump version
2023-12-22 14:55:58 -05:00
Andrei Betlen
6d8bc090f9
fix: inccorect bindings for kv override. Based on #1011
2023-12-22 14:52:20 -05:00
Andrei Betlen
f4be84c122
Fix typo
2023-12-22 14:40:44 -05:00
Andrei Betlen
9b3a5939f3
docs: Add multi-model link to readme
2023-12-22 14:40:13 -05:00
Andrei Betlen
522aecb868
docs: add server config docs
2023-12-22 14:37:24 -05:00
Andrei Betlen
6473796343
Update llama.cpp
2023-12-22 14:10:34 -05:00
Andrei Betlen
15ee2106f6
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-12-22 14:05:26 -05:00
swg
4b01a873ef
server: Support none defaulting to infinity for completions ( #111 )
...
* Support defaulting to infinity or -1 for chat completions
* Check if completion_tokens is none in error handler.
* fix: max_tokens in create completion should match openai spec
* Fix __call__
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 14:05:13 -05:00
Andrei Betlen
99ff175562
Check if completion_tokens is none in error handler.
2023-12-22 13:41:06 -05:00
Dave
12b7f2f4e9
[Feat] Multi model support ( #931 )
...
* Update Llama class to handle chat_format & caching
* Add settings.py
* Add util.py & update __main__.py
* multimodel
* update settings.py
* cleanup
* delete util.py
* Fix /v1/models endpoint
* MultiLlama now iterable, app check-alive on "/"
* instant model init if file is given
* backward compability
* revert model param mandatory
* fix error
* handle individual model config json
* refactor
* revert chathandler/clip_model changes
* handle chat_handler in MulitLlama()
* split settings into server/llama
* reduce global vars
* Update LlamaProxy to handle config files
* Add free method to LlamaProxy
* update arg parsers & install server alias
* refactor cache settings
* change server executable name
* better var name
* whitespace
* Revert "whitespace"
This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8.
* remove exe_name
* Fix merge bugs
* Fix type annotations
* Fix type annotations
* Fix uvicorn app factory
* Fix settings
* Refactor server
* Remove formatting fix
* Format
* Use default model if not found in model settings
* Fix
* Cleanup
* Fix
* Fix
* Remove unnused CommandLineSettings
* Cleanup
* Support default name for copilot-codex models
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 05:51:25 -05:00
Andrei Betlen
4a85442c35
Update llama.cpp
2023-12-22 00:12:37 -05:00
twaka
2f03fb0231
fix text_offset of multi-token characters ( #1037 )
...
* fix text_offsets for bytes tokens
* fix
2023-12-22 00:03:29 -05:00
docmeth02
33cc623346
Implement openai api compatible authentication ( #1010 )
2023-12-21 13:44:49 -05:00
Andrei Betlen
788394c096
Update llama.cpp
2023-12-21 13:16:46 -05:00
Andrei Betlen
ffceb772d1
Update llama.cpp
2023-12-19 17:05:40 -05:00
Andrei Betlen
a05b4da80a
fix: float32 is not JSON serializable when streaming logits.
2023-12-18 18:40:36 -05:00
Andrei Betlen
abda047284
Update changelog
2023-12-18 18:16:17 -05:00
Andrei Betlen
7df6c32544
Fix type annotations
2023-12-18 18:14:53 -05:00
Andrei Betlen
b703aad79e
Fix type annotation
2023-12-18 18:13:37 -05:00
Andrei Betlen
d0aedfcff6
Fix type annotation
2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu
2993936b10
Fix ctypes definitions of llama_kv_cache_view_update
and llama_kv_cache_view_free
. ( #1028 )
2023-12-18 18:11:26 -05:00
Andrei Betlen
5e863d8a3b
Bump version
2023-12-18 16:09:18 -05:00
Jonathan Soma
cfd698c75c
Update low_level_api_llama_cpp.py to match current API ( #1023 )
2023-12-18 15:59:11 -05:00
Andrei Betlen
095c650006
Add offload_kqv option to llama and server
2023-12-18 15:36:09 -05:00
Andrei Betlen
472b344ae3
Remove unnused import
2023-12-18 15:32:40 -05:00
Andrei Betlen
2fc48c54be
Update llama.cpp
2023-12-18 15:32:15 -05:00
kddubey
6b2e0e05b4
perf: Don't convert logprobs arrays to lists ( #1021 )
2023-12-18 14:28:12 -05:00
Brandon Roberts
62944df142
Bugfix: Remove f16_kv, add offload_kqv field ( #1019 )
...
F16_KV appears to have been removed here: af99c6fbfc
This addresses two issues:
- #995 which just requests to add the KV cache offloading param
- #1006 a NULL ptr exception when using the embeddings (introduced by
leaving f16_kv in the fields struct)
2023-12-18 14:27:11 -05:00
evelynmitchell
37da8e863a
Update README.md functionary demo typo ( #996 )
...
missing comma
2023-12-16 19:00:30 -05:00
Daniele Morotti
f1c631dc53
Bug fixed with n_ctx=0 ( #1015 )
...
If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.
2023-12-16 18:59:50 -05:00
kddubey
5a8944672f
Fix logits_to_logprobs for 2-D and 3-D logits ( #1002 )
...
* Fix logits_to_logprobs for 2-D and 3-D logits
* Set dtype to single
* Test size
2023-12-16 18:59:26 -05:00
Andrei Betlen
534b1ea9b5
Update llama.cpp
2023-12-16 18:57:43 -05:00
Andrei Betlen
cbce061ffd
Bump version
2023-12-13 21:52:29 -05:00
yhfgyyf
8b4db732bd
Add qwen chat format ( #1005 )
2023-12-13 21:43:43 -05:00