Andrei Betlen
b047b3034e
Remove confusing helpstring from server cli args. Closes #719
2023-09-15 14:09:43 -04:00
Andrei Betlen
0449d29b9f
Fix boolean env vars and cli arguments
2023-09-13 23:09:57 -04:00
earonesty
58a6e42cc0
Update app.py ( #705 )
2023-09-13 23:01:34 -04:00
Andrei Betlen
f4090a0bb2
Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs.
2023-09-13 23:00:43 -04:00
Andrei Betlen
c999325e8e
Fix boolean cli flags
2023-09-13 22:56:10 -04:00
Andrei Betlen
4daf77e546
Format
2023-09-13 21:23:23 -04:00
Andrei Betlen
2920c4bf7e
Update server params. Added lora_base, lora_path, low_vram, and main_gpu. Removed rms_norm_eps and n_gqa (deprecated in llama.cpp)
2023-09-13 21:23:13 -04:00
Andrei Betlen
c4c440ba2d
Fix tensor_split cli option
2023-09-13 20:00:42 -04:00
Andrei Betlen
759405c84b
Fix issue with Literal and Optional cli arguments not working. Closes #702
2023-09-13 18:06:12 -04:00
Devrim
da9df78db0
Add X-Request-ID request header for mirroring custom IDs. ( #703 )
2023-09-13 16:18:31 -04:00
Andrei Betlen
1910793f56
Merge branch 'main' into v0.2-wip
2023-09-12 16:43:32 -04:00
Andrei Betlen
5de8009706
Add copilot-codex completions endpoint for drop-in copilot usage
2023-08-25 17:49:14 -04:00
Andrei Betlen
cf405f6764
Merge branch 'main' into v0.2-wip
2023-08-24 00:30:51 -04:00
Andrei Betlen
d015bdb4f8
Add mul_mat_q option
2023-08-08 14:35:06 -04:00
Andrei Betlen
343480364f
Merge branch 'main' into v0.2-wip
2023-07-24 15:26:08 -04:00
Andrei Betlen
11dd2bf382
Add temporary rms_norm_eps parameter
2023-07-24 14:09:24 -04:00
Andrei Betlen
0538ba1dab
Merge branch 'main' into v0.2-wip
2023-07-20 19:06:26 -04:00
Andrei Betlen
28a111704b
Fix compatibility with older python versions
2023-07-20 18:52:10 -04:00
Andrei
365d9a4367
Merge pull request #481 from c0sogi/main
...
Added `RouteErrorHandler` for server
2023-07-20 17:41:42 -04:00
Andrei Betlen
0b121a7456
Format
2023-07-19 03:48:27 -04:00
Andrei Betlen
b43917c144
Add functions parameters
2023-07-19 03:48:20 -04:00
Andrei Betlen
19ba9d3845
Use numpy arrays for logits_processors and stopping_criteria. Closes #491
2023-07-18 19:27:41 -04:00
shutup
5ed8bf132f
expose RoPE param to server start
2023-07-18 16:34:36 +08:00
c0sogi
1551ba10bd
Added RouteErrorHandler
for server
2023-07-16 14:57:39 +09:00
Andrei Betlen
118b7f6d5c
fix: tensor_split should be optional list
2023-07-14 16:52:48 -04:00
Shouyi Wang
579f526246
Resolve merge conflicts
2023-07-14 14:37:01 +10:00
Andrei Betlen
de4cc5a233
bugfix: pydantic v2 fields
2023-07-13 23:25:12 -04:00
Andrei Betlen
6f70cc4b7d
bugfix: pydantic settings missing / changed fields
2023-07-09 18:03:31 -04:00
Shouyi Wang
9f21f548a5
Add tensor split
2023-07-09 23:00:59 +10:00
Andrei Betlen
52753b77f5
Upgrade fastapi to 0.100.0 and pydantic v2
2023-07-07 21:38:46 -04:00
Andrei Betlen
57d8ec3899
Add setting to control request interruption
2023-07-07 03:37:23 -04:00
Andrei Betlen
4c7cdcca00
Add interruptible streaming requests for llama-cpp-python server. Closes #183
2023-07-07 03:04:17 -04:00
Alexey
282698b6d3
server: pass seed param from command line to llama
2023-06-23 00:19:24 +04:00
Andrei Betlen
1e20be6d0c
Add low_vram to server settings
2023-06-14 22:13:42 -04:00
Andrei Betlen
f7c5cfaf50
Format server options
2023-06-14 22:08:28 -04:00
Andrei Betlen
9c41a3e990
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-06-14 21:50:43 -04:00
Andrei
f568baeef1
Merge pull request #351 from player1537-forks/th/add-logits-bias-parameter
...
Add support for `logit_bias` and `logit_bias_type` parameters
2023-06-14 21:49:56 -04:00
Andrei Betlen
f27393ab7e
Add additional verbose logs for cache
2023-06-14 21:46:48 -04:00
Gabor
3129a0e7e5
correction to add back environment variable support <3 docker
2023-06-11 01:11:24 +01:00
Gabor
3ea31930e5
fixes abetlen/llama-cpp-python #358
2023-06-11 00:58:08 +01:00
Tanner Hobson
eb7645b3ba
Add support for logit_bias and logit_bias_type parameters
2023-06-09 13:13:08 -04:00
Andrei Betlen
0c42168508
Fix cache implementation breaking changes
2023-06-08 13:19:23 -04:00
Eric B
9b1c9e902c
Added mirostat support for completions, chat completions API
2023-06-05 22:37:11 -04:00
Andrei Betlen
80066f0b80
Use async routes
2023-05-27 09:12:58 -04:00
Andrei Betlen
c2b59a5f59
Import unnused import
2023-05-26 22:59:29 -04:00
Simon Chabot
e783f1c191
feat: make embedding support list of string as input
...
makes the /v1/embedding route similar to OpenAI api.
2023-05-20 01:23:32 +02:00
Andrei Betlen
a8cd169251
Bugfix: Stop sequences can be strings
2023-05-19 03:15:08 -04:00
Andrei Betlen
dc39cc0fa4
Use server sent events function for streaming completion
2023-05-19 02:04:30 -04:00
Andrei Betlen
a3352923c7
Add model_alias option to override model_path in completions. Closes #39
2023-05-16 17:22:00 -04:00
Andrei Betlen
cdf59768f5
Update llama.cpp
2023-05-14 00:04:22 -04:00
Andrei Betlen
8740ddc58e
Only support generating one prompt at a time.
2023-05-12 07:21:46 -04:00
Andrei Betlen
8895b9002a
Revert "llama_cpp server: prompt is a string". Closes #187
...
This reverts commit b9098b0ef7
.
2023-05-12 07:16:57 -04:00
Lucas Doyle
02e8a018ae
llama_cpp server: document presence_penalty and frequency_penalty, mark as supported
2023-05-09 16:25:00 -07:00
Andrei Betlen
82d138fe54
Fix: default repeat_penalty
2023-05-08 18:49:11 -04:00
Andrei Betlen
29f094bbcf
Bugfix: not falling back to environment variables when default is value is set.
2023-05-08 14:46:25 -04:00
Andrei Betlen
0d6c60097a
Show default value when --help is called
2023-05-08 14:21:15 -04:00
Andrei Betlen
022e9ebcb8
Use environment variable if parsed cli arg is None
2023-05-08 14:20:53 -04:00
Andrei Betlen
0d751a69a7
Set repeat_penalty to 0 by default
2023-05-08 01:50:43 -04:00
Andrei Betlen
65d9cc050c
Add openai frequency and presence penalty parameters. Closes #169
2023-05-08 01:30:18 -04:00
Andrei Betlen
a0b61ea2a7
Bugfix for models endpoint
2023-05-07 20:17:52 -04:00
Andrei Betlen
14da46f16e
Added cache size to settins object.
2023-05-07 19:33:17 -04:00
Andrei Betlen
627811ea83
Add verbose flag to server
2023-05-07 05:09:10 -04:00
Andrei Betlen
3fbda71790
Fix mlock_supported and mmap_supported return type
2023-05-07 03:04:22 -04:00
Andrei Betlen
5a3413eee3
Update cpu_count
2023-05-07 03:03:57 -04:00
Andrei Betlen
1a00e452ea
Update settings fields and defaults
2023-05-07 02:52:20 -04:00
Andrei Betlen
86753976c4
Revert "llama_cpp server: delete some ignored / unused parameters"
...
This reverts commit b47b9549d5
.
2023-05-07 02:02:34 -04:00
Andrei Betlen
c382d8f86a
Revert "llama_cpp server: mark model as required"
...
This reverts commit e40fcb0575
.
2023-05-07 02:00:22 -04:00
Andrei Betlen
d8fddcce73
Merge branch 'main' of github.com:abetlen/llama_cpp_python into better-server-params-and-fields
2023-05-07 01:54:00 -04:00
Andrei Betlen
24fc38754b
Add cli options to server. Closes #37
2023-05-05 12:08:28 -04:00
Lucas Doyle
b9098b0ef7
llama_cpp server: prompt is a string
...
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion
This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Andrei
7ab08b8d10
Merge branch 'main' into better-server-params-and-fields
2023-05-01 22:45:57 -04:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Lucas Doyle
dbbfc4ba2f
llama_cpp server: fix to ChatCompletionRequestMessage
...
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage
These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle
fa2a61e065
llama_cpp server: fields for the embedding endpoint
2023-05-01 15:38:19 -07:00
Lucas Doyle
8dcbf65a45
llama_cpp server: define fields for chat completions
...
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle
978b6daf93
llama_cpp server: add some more information to fields for completions
2023-05-01 15:38:19 -07:00
Lucas Doyle
a5aa6c1478
llama_cpp server: add missing top_k param to CreateChatCompletionRequest
...
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle
1e42913599
llama_cpp server: move logprobs to supported
...
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle
b47b9549d5
llama_cpp server: delete some ignored / unused parameters
...
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle
e40fcb0575
llama_cpp server: mark model as required
...
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.
I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen
9ff9cdd7fc
Fix import error
2023-05-01 15:11:15 -04:00
Lucas Doyle
efe8e6f879
llama_cpp server: slight refactor to init_llama function
...
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py
This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017
tests: simple test for server module
2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2
llama_cpp server: app is now importable, still runnable as a module
2023-04-29 11:41:25 -07:00
Andrei Betlen
3cab3ef4cb
Update n_batch for server
2023-04-25 09:11:32 -04:00
Andrei Betlen
e4647c75ec
Add use_mmap flag to server
2023-04-19 15:57:46 -04:00
Andrei Betlen
92c077136d
Add experimental cache
2023-04-15 12:03:09 -04:00
Andrei Betlen
6c7cec0c65
Fix completion request
2023-04-14 10:01:15 -04:00
Andrei Betlen
4f5f99ef2a
Formatting
2023-04-12 22:40:12 -04:00
Andrei Betlen
0daf16defc
Enable logprobs on completion endpoint
2023-04-12 19:08:11 -04:00
Andrei Betlen
19598ac4e8
Fix threading bug. Closes #62
2023-04-12 19:07:53 -04:00
Andrei Betlen
b3805bb9cc
Implement logprobs parameter for text completion. Closes #2
2023-04-12 14:05:11 -04:00
Andrei Betlen
213cc5c340
Remove async from function signature to avoid blocking the server
2023-04-11 11:54:31 -04:00
Andrei Betlen
0067c1a588
Formatting
2023-04-08 16:01:18 -04:00
Andrei Betlen
da539cc2ee
Safer calculation of default n_threads
2023-04-06 21:22:19 -04:00
Andrei Betlen
930db37dd2
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-04-06 21:07:38 -04:00
Andrei Betlen
55279b679d
Handle prompt list
2023-04-06 21:07:35 -04:00
MillionthOdin16
c283edd7f2
Set n_batch to default values and reduce thread count:
...
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.
Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16
76a82babef
Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048.
2023-04-05 17:44:53 -04:00
Andrei Betlen
44448fb3a8
Add server as a subpackage
2023-04-05 16:23:25 -04:00