Commit graph

1333 commits

Author SHA1 Message Date
Andrei Betlen
f952d45c2c Update llama.cpp 2023-12-24 01:34:36 -05:00
Andrei Betlen
f6f157c06d Update bug report instructions for new build process. 2023-12-22 15:35:51 -05:00
Andrei Betlen
92284f32cb Add HIP_PATH to dll search directories for windows users. 2023-12-22 15:29:56 -05:00
Andrei Betlen
2b0d3f36fa set llama_max_devices using library function 2023-12-22 15:19:28 -05:00
Andrei Betlen
d9a1d90fd7 Fix typo 2023-12-22 15:12:27 -05:00
Andrei Betlen
37556bf9c4 Bump version 2023-12-22 14:55:58 -05:00
Andrei Betlen
6d8bc090f9 fix: inccorect bindings for kv override. Based on #1011 2023-12-22 14:52:20 -05:00
Andrei Betlen
f4be84c122 Fix typo 2023-12-22 14:40:44 -05:00
Andrei Betlen
9b3a5939f3 docs: Add multi-model link to readme 2023-12-22 14:40:13 -05:00
Andrei Betlen
522aecb868 docs: add server config docs 2023-12-22 14:37:24 -05:00
Andrei Betlen
6473796343 Update llama.cpp 2023-12-22 14:10:34 -05:00
Andrei Betlen
15ee2106f6 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-12-22 14:05:26 -05:00
swg
4b01a873ef
server: Support none defaulting to infinity for completions (#111)
* Support defaulting to infinity or -1 for chat completions

* Check if completion_tokens is none in error handler.

* fix: max_tokens in create completion should match openai spec

* Fix __call__

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 14:05:13 -05:00
Andrei Betlen
99ff175562 Check if completion_tokens is none in error handler. 2023-12-22 13:41:06 -05:00
Dave
12b7f2f4e9
[Feat] Multi model support (#931)
* Update Llama class to handle chat_format & caching

* Add settings.py

* Add util.py & update __main__.py

* multimodel

* update settings.py

* cleanup

* delete util.py

* Fix /v1/models endpoint

* MultiLlama now iterable, app check-alive on "/"

* instant model init if file is given

* backward compability

* revert model param mandatory

* fix error

* handle individual model config json

* refactor

* revert chathandler/clip_model changes

* handle chat_handler in MulitLlama()

* split settings into server/llama

* reduce global vars

* Update LlamaProxy to handle config files

* Add free method to LlamaProxy

* update arg parsers & install server alias

* refactor cache settings

* change server executable name

* better var name

* whitespace

* Revert "whitespace"

This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8.

* remove exe_name

* Fix merge bugs

* Fix type annotations

* Fix type annotations

* Fix uvicorn app factory

* Fix settings

* Refactor server

* Remove formatting fix

* Format

* Use default model if not found in model settings

* Fix

* Cleanup

* Fix

* Fix

* Remove unnused CommandLineSettings

* Cleanup

* Support default name for copilot-codex models

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 05:51:25 -05:00
Andrei Betlen
4a85442c35 Update llama.cpp 2023-12-22 00:12:37 -05:00
twaka
2f03fb0231
fix text_offset of multi-token characters (#1037)
* fix text_offsets for bytes tokens

* fix
2023-12-22 00:03:29 -05:00
docmeth02
33cc623346
Implement openai api compatible authentication (#1010) 2023-12-21 13:44:49 -05:00
Andrei Betlen
788394c096 Update llama.cpp 2023-12-21 13:16:46 -05:00
Andrei Betlen
ffceb772d1 Update llama.cpp 2023-12-19 17:05:40 -05:00
Andrei Betlen
a05b4da80a fix: float32 is not JSON serializable when streaming logits. 2023-12-18 18:40:36 -05:00
Andrei Betlen
abda047284 Update changelog 2023-12-18 18:16:17 -05:00
Andrei Betlen
7df6c32544 Fix type annotations 2023-12-18 18:14:53 -05:00
Andrei Betlen
b703aad79e Fix type annotation 2023-12-18 18:13:37 -05:00
Andrei Betlen
d0aedfcff6 Fix type annotation 2023-12-18 18:12:49 -05:00
Eduard Christian Dumitrescu
2993936b10
Fix ctypes definitions of llama_kv_cache_view_update and llama_kv_cache_view_free. (#1028) 2023-12-18 18:11:26 -05:00
Andrei Betlen
5e863d8a3b Bump version 2023-12-18 16:09:18 -05:00
Jonathan Soma
cfd698c75c
Update low_level_api_llama_cpp.py to match current API (#1023) 2023-12-18 15:59:11 -05:00
Andrei Betlen
095c650006 Add offload_kqv option to llama and server 2023-12-18 15:36:09 -05:00
Andrei Betlen
472b344ae3 Remove unnused import 2023-12-18 15:32:40 -05:00
Andrei Betlen
2fc48c54be Update llama.cpp 2023-12-18 15:32:15 -05:00
kddubey
6b2e0e05b4
perf: Don't convert logprobs arrays to lists (#1021) 2023-12-18 14:28:12 -05:00
Brandon Roberts
62944df142
Bugfix: Remove f16_kv, add offload_kqv field (#1019)
F16_KV appears to have been removed here: af99c6fbfc

This addresses two issues:

 - #995 which just requests to add the KV cache offloading param
 - #1006 a NULL ptr exception when using the embeddings (introduced by
   leaving f16_kv in the fields struct)
2023-12-18 14:27:11 -05:00
evelynmitchell
37da8e863a
Update README.md functionary demo typo (#996)
missing comma
2023-12-16 19:00:30 -05:00
Daniele Morotti
f1c631dc53
Bug fixed with n_ctx=0 (#1015)
If the n_ctx is set to 0 the code should use the maximum context length of the selected model, but it didn't work. There was a problem with the initialization of this parameter and a related problem with 'n_batch'.
2023-12-16 18:59:50 -05:00
kddubey
5a8944672f
Fix logits_to_logprobs for 2-D and 3-D logits (#1002)
* Fix logits_to_logprobs for 2-D and 3-D logits

* Set dtype to single

* Test size
2023-12-16 18:59:26 -05:00
Andrei Betlen
534b1ea9b5 Update llama.cpp 2023-12-16 18:57:43 -05:00
Andrei Betlen
cbce061ffd Bump version 2023-12-13 21:52:29 -05:00
yhfgyyf
8b4db732bd
Add qwen chat format (#1005) 2023-12-13 21:43:43 -05:00
Andrei Betlen
690c563b60 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-12-13 21:43:19 -05:00
Andrei Betlen
c0fc0a1e82 Update llama.cpp 2023-12-13 21:43:16 -05:00
Radoslav Gerganov
8e44a32075
Add support for running the server with SSL (#994) 2023-12-11 20:47:11 -05:00
Tanner Hobson
ef22e478db
Replace logits_to_logprobs implementation with numpy equivalent to llama.cpp (#991)
See #990. This change makes the logits_to_logprobs function equivalent to the version in the llama.cpp repository. It uses numpy so it's much faster than the previous version.
2023-12-11 20:46:27 -05:00
zocainViken
ac35f68e4d
Fix UnsupportedOperation: fileno in suppress_stdout_stderr (#961)
* bug fixing

* llava from readme got this error: UnsupportedOperation: fileno   quick fix by checking hasattr

* multi modal params fix: add logits = True -> to make llava work

* multi modal params fix: add logits = True -> to make llava work

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2023-12-11 20:44:51 -05:00
chiensen
b938cccf05
Add Pygmalion chat format (#986) 2023-12-11 20:44:04 -05:00
zocainViken
6bbeea07ae
README.md multimodal params fix (#967)
multi modal params fix: add logits = True -> to make llava work
2023-12-11 20:41:38 -05:00
Aniket Maurya
c1d92ce680
fix minor typo (#958)
* fix minor typo

* Fix typo

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2023-12-11 20:40:38 -05:00
Andrei Betlen
e9bc4c4baf Fix docker build 2023-12-11 10:39:51 -05:00
Andrei Betlen
c1e73e73a3 Bump version 2023-12-11 10:26:42 -05:00
Andrei Betlen
ec26f364cc Remove f16_kv 2023-12-11 10:25:37 -05:00