Lucas Doyle
0fcc25cdac
examples fastapi_server: deprecate
...
This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more.
Rationale:
Currently there exist two server implementations in this repo:
- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around
IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.
The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
2023-05-01 22:34:23 -07:00
Andrei Betlen
c2e31eecee
Update permissions
2023-05-02 01:23:17 -04:00
Andrei Betlen
63f8d3a6fb
Update context
2023-05-02 01:16:44 -04:00
Andrei Betlen
c21a34506e
Update permsissions
2023-05-02 01:13:43 -04:00
Andrei Betlen
872b2ec33f
Clone submodules
2023-05-02 01:11:34 -04:00
Andrei Betlen
62de4692f2
Fix missing dependency
2023-05-02 01:09:27 -04:00
Andrei
25062cecd3
Merge pull request #140 from abetlen/Niek/main
...
Add Dockerfile
2023-05-02 01:06:00 -04:00
Andrei Betlen
36c81489e7
Remove docker section of publish
2023-05-02 01:04:36 -04:00
Andrei Betlen
5d5421b29d
Add build docker
2023-05-02 01:04:02 -04:00
Andrei Betlen
81631afc48
Install from local directory
2023-05-02 00:55:51 -04:00
Andrei Betlen
d605408f99
Add dockerignore
2023-05-02 00:55:34 -04:00
Andrei
e644e75915
Merge pull request #139 from matthoffner/patch-1
...
Fix FTYPE typo
2023-05-02 00:33:45 -04:00
Matt Hoffner
f97ff3c5bb
Update llama_cpp.py
2023-05-01 20:40:06 -07:00
Andrei Betlen
e9e0654aed
Bump version
2023-05-01 22:52:25 -04:00
Andrei Betlen
46e3c4b84a
Fix
2023-05-01 22:41:54 -04:00
Andrei Betlen
9eafc4c49a
Refactor server to use factory
2023-05-01 22:38:46 -04:00
Andrei Betlen
dd9ad1c759
Formatting
2023-05-01 21:51:16 -04:00
Andrei Betlen
9d60ae56f2
Fix whitespace
2023-05-01 18:07:45 -04:00
Andrei Betlen
53c0129eb6
Update submoduele clone instructions
2023-05-01 18:07:15 -04:00
Andrei Betlen
b6747f722e
Fix logprob calculation. Fixes #134
2023-05-01 17:45:08 -04:00
Andrei Betlen
c088a2b3a7
Un-skip tests
2023-05-01 15:46:03 -04:00
Andrei Betlen
bf3d0dcb2c
Fix tests
2023-05-01 15:28:46 -04:00
Andrei Betlen
5034bbf499
Bump version
2023-05-01 15:23:59 -04:00
Andrei Betlen
f073ef0571
Update llama.cpp
2023-05-01 15:23:01 -04:00
Andrei Betlen
9ff9cdd7fc
Fix import error
2023-05-01 15:11:15 -04:00
Andrei Betlen
2f8a3adaa4
Temporarily skip sampling tests.
2023-05-01 15:01:49 -04:00
Andrei Betlen
dbe0ad86c8
Update test dependencies
2023-05-01 14:50:01 -04:00
Andrei Betlen
350a1769e1
Update sampling api
2023-05-01 14:47:55 -04:00
Andrei Betlen
7837c3fdc7
Fix return types and import comments
2023-05-01 14:02:06 -04:00
Andrei Betlen
55d6308537
Fix test dependencies
2023-05-01 11:39:18 -04:00
Andrei Betlen
ccf1ed54ae
Merge branch 'main' of github.com:abetlen/llama_cpp_python into main
2023-05-01 11:35:14 -04:00
Andrei
79ba9ed98d
Merge pull request #125 from Stonelinks/app-server-module-importable
...
Make app server module importable
2023-05-01 11:31:08 -04:00
Andrei Betlen
80184a286c
Update llama.cpp
2023-05-01 10:44:28 -04:00
Lucas Doyle
efe8e6f879
llama_cpp server: slight refactor to init_llama function
...
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py
This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017
tests: simple test for server module
2023-04-29 11:42:20 -07:00
Lucas Doyle
468377b0e2
llama_cpp server: app is now importable, still runnable as a module
2023-04-29 11:41:25 -07:00
Andrei
755f9fa455
Merge pull request #118 from SagsMug/main
...
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug
18a0c10032
Remove excessive errors="ignore" and add utf8 test
2023-04-29 12:19:22 +02:00
Andrei Betlen
523825e91d
Update README
2023-04-28 17:12:03 -04:00
Andrei Betlen
e00beb13b5
Update README
2023-04-28 17:08:18 -04:00
Andrei Betlen
5423d047c7
Bump version
2023-04-28 15:33:08 -04:00
Andrei Betlen
ea0faabae1
Update llama.cpp
2023-04-28 15:32:43 -04:00
Mug
b7d14efc8b
Python weirdness
2023-04-28 13:20:31 +02:00
Mug
eed61289b6
Dont detect off tokens, detect off detokenized utf8
2023-04-28 13:16:18 +02:00
Mug
3a98747026
One day, i'll fix off by 1 errors permanently too
2023-04-28 12:54:28 +02:00
Mug
c39547a986
Detect multi-byte responses and wait
2023-04-28 12:50:30 +02:00
Andrei Betlen
9339929f56
Update llama.cpp
2023-04-26 20:00:54 -04:00
Mug
5f81400fcb
Also ignore errors on input prompts
2023-04-26 14:45:51 +02:00
Mug
3c130f00ca
Remove try catch from chat
2023-04-26 14:38:53 +02:00
Mug
be2c961bc9
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python
2023-04-26 14:38:09 +02:00