Jerry Liu
84380fe9a6
Add llamaindex integration to readme ( #1092 )
2024-01-16 19:10:50 -05:00
Kyle Mistele
9c36688b33
fix(cli): allow passing n_ctx=0 to openAI API server args to use model n_ctx_train field per #1015 ( #1093 )
2024-01-16 18:54:06 -05:00
anil
cfb7da98ed
Support Accept text/event-stream in chat and completion endpoints, resolves #1083 ( #1088 )
...
Co-authored-by: Anil Pathak <anil@heyday.com>
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-01-16 12:52:52 -05:00
Andrei Betlen
e39778f8eb
Update llama.cpp
2024-01-16 11:56:44 -05:00
Andrei Betlen
4b11fa83c0
Bump version
2024-01-15 12:54:51 -05:00
Andrei Betlen
84615adbc6
Add split_mode option. Closes #1085
2024-01-15 12:49:20 -05:00
Phil H
76aafa6149
Implement GGUF metadata KV overrides ( #1011 )
...
* Implement GGUF metadata overrides
* whitespace fix
* Fix kv overrides.
* Fix pointer and pickle
* Match llama.cpp kv_overrides cli argument
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-01-15 12:29:29 -05:00
yieldthought
7eff42c239
Avoid "LookupError: unknown encoding: ascii" when open() called in a destructor ( #1012 )
...
The existing code often causes "LookupError: unknown encoding: ascii" when open() called in a destructor. Saving open in self.open is not enough to avoid this. Instead, we can avoid reopening /dev/null every time by doing it once when the module is loaded.
2024-01-15 10:52:10 -05:00
anil
1eaace8ea3
Fix low_level_api_chat_cpp example to match current API ( #1086 )
...
* Fix low_level_api_chat_cpp to match current API
* Fix low_level_api_chat_cpp to match current API
* Using None instead of empty string to so that default prompt template can be used if no prompt provided
---------
Co-authored-by: Anil Pathak <anil@heyday.com>
2024-01-15 10:46:35 -05:00
Mark Neumann
c689ccc728
Fix Pydantic model parsing ( #1087 )
2024-01-15 10:45:57 -05:00
Andrei Betlen
5502ac8876
Update llama.cpp
2024-01-15 10:12:10 -05:00
Andrei Betlen
359ae73643
Update llama.cpp
2024-01-14 08:17:22 -05:00
Andrei Betlen
7c898d5684
Update llama.cpp
2024-01-13 22:37:49 -05:00
Andrei Betlen
bb610b9428
Update llama.cpp
2024-01-11 22:51:12 -05:00
Andrei Betlen
f0159663d9
Bump version
2024-01-10 02:51:17 -05:00
Stephen Hankinson
df3be58d6c
Add ability to pass in penalize_nl param ( #1068 )
2024-01-10 02:46:27 -05:00
Joseph Turian
2ddce7294e
print_grammar to stderr ( #1052 )
2024-01-10 02:46:03 -05:00
Andrei Betlen
431cb3ec81
Update llama.cpp
2024-01-09 15:32:39 -05:00
Andrei Betlen
1ae05c102b
Update llama.cpp
2024-01-08 14:51:29 -05:00
Andrei Betlen
142a9e1bc3
Update llama.cpp
2024-01-05 16:20:50 -05:00
Andrei Betlen
75d0527fd7
Bump version
2024-01-04 18:30:12 -05:00
Andrei Betlen
fffcd0181c
Update llama.cpp
2024-01-04 18:26:00 -05:00
Fedor Moiseev
907b9e9d42
Add Saiga chat format. ( #1050 )
2024-01-04 18:12:58 -05:00
Caleb Hoff
f766b70c9a
Fix: Correct typo in README.md ( #1058 )
...
In Llama.create_chat_completion, the `tool_choice` property does not have an s on the end.
2024-01-04 18:12:32 -05:00
xaviviro
cf743ec5d3
Added ChatGLM chat format ( #1059 )
...
Co-authored-by: Xavier Vinaixa Rosello <xaviviro@MacBook-Pro-de-Xavier.local>
2024-01-04 18:12:02 -05:00
Andrei Betlen
eb9c7d4ed8
Update llama.cpp
2024-01-03 22:04:04 -05:00
Andrei Betlen
011c3630f5
Bump version
2023-12-27 17:35:02 -05:00
Andrei Betlen
969ea6a2c0
Update llama.cpp
2023-12-27 17:33:26 -05:00
Andrei Betlen
f952d45c2c
Update llama.cpp
2023-12-24 01:34:36 -05:00
Andrei Betlen
f6f157c06d
Update bug report instructions for new build process.
2023-12-22 15:35:51 -05:00
Andrei Betlen
92284f32cb
Add HIP_PATH to dll search directories for windows users.
2023-12-22 15:29:56 -05:00
Andrei Betlen
2b0d3f36fa
set llama_max_devices using library function
2023-12-22 15:19:28 -05:00
Andrei Betlen
d9a1d90fd7
Fix typo
2023-12-22 15:12:27 -05:00
Andrei Betlen
37556bf9c4
Bump version
2023-12-22 14:55:58 -05:00
Andrei Betlen
6d8bc090f9
fix: inccorect bindings for kv override. Based on #1011
2023-12-22 14:52:20 -05:00
Andrei Betlen
f4be84c122
Fix typo
2023-12-22 14:40:44 -05:00
Andrei Betlen
9b3a5939f3
docs: Add multi-model link to readme
2023-12-22 14:40:13 -05:00
Andrei Betlen
522aecb868
docs: add server config docs
2023-12-22 14:37:24 -05:00
Andrei Betlen
6473796343
Update llama.cpp
2023-12-22 14:10:34 -05:00
Andrei Betlen
15ee2106f6
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-12-22 14:05:26 -05:00
swg
4b01a873ef
server: Support none defaulting to infinity for completions ( #111 )
...
* Support defaulting to infinity or -1 for chat completions
* Check if completion_tokens is none in error handler.
* fix: max_tokens in create completion should match openai spec
* Fix __call__
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 14:05:13 -05:00
Andrei Betlen
99ff175562
Check if completion_tokens is none in error handler.
2023-12-22 13:41:06 -05:00
Dave
12b7f2f4e9
[Feat] Multi model support ( #931 )
...
* Update Llama class to handle chat_format & caching
* Add settings.py
* Add util.py & update __main__.py
* multimodel
* update settings.py
* cleanup
* delete util.py
* Fix /v1/models endpoint
* MultiLlama now iterable, app check-alive on "/"
* instant model init if file is given
* backward compability
* revert model param mandatory
* fix error
* handle individual model config json
* refactor
* revert chathandler/clip_model changes
* handle chat_handler in MulitLlama()
* split settings into server/llama
* reduce global vars
* Update LlamaProxy to handle config files
* Add free method to LlamaProxy
* update arg parsers & install server alias
* refactor cache settings
* change server executable name
* better var name
* whitespace
* Revert "whitespace"
This reverts commit bc5cf51c64a95bfc9926e1bc58166059711a1cd8.
* remove exe_name
* Fix merge bugs
* Fix type annotations
* Fix type annotations
* Fix uvicorn app factory
* Fix settings
* Refactor server
* Remove formatting fix
* Format
* Use default model if not found in model settings
* Fix
* Cleanup
* Fix
* Fix
* Remove unnused CommandLineSettings
* Cleanup
* Support default name for copilot-codex models
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-12-22 05:51:25 -05:00
Andrei Betlen
4a85442c35
Update llama.cpp
2023-12-22 00:12:37 -05:00
twaka
2f03fb0231
fix text_offset of multi-token characters ( #1037 )
...
* fix text_offsets for bytes tokens
* fix
2023-12-22 00:03:29 -05:00
docmeth02
33cc623346
Implement openai api compatible authentication ( #1010 )
2023-12-21 13:44:49 -05:00
Andrei Betlen
788394c096
Update llama.cpp
2023-12-21 13:16:46 -05:00
Andrei Betlen
ffceb772d1
Update llama.cpp
2023-12-19 17:05:40 -05:00
Andrei Betlen
a05b4da80a
fix: float32 is not JSON serializable when streaming logits.
2023-12-18 18:40:36 -05:00
Andrei Betlen
abda047284
Update changelog
2023-12-18 18:16:17 -05:00