Andrei
7499fc1cbb
Merge pull request #126 from Stonelinks/deprecate-example-server
...
Deprecate example server
2023-05-08 19:29:04 -04:00
Mug
eaf9f19aa9
Fix lora
2023-05-08 15:27:42 +02:00
Mug
2c0d9b182c
Fix session loading and saving in low level example chat
2023-05-08 15:27:03 +02:00
Mug
fd80ddf703
Fix a bug with wrong type
2023-05-06 22:22:28 +02:00
Mug
996f63e9e1
Add utf8 to chat example
2023-05-06 15:16:58 +02:00
Mug
3ceb47b597
Fix mirastat requiring c_float
2023-05-06 13:35:50 +02:00
Mug
9797394c81
Wrong logit_bias parsed type
2023-05-06 13:27:52 +02:00
Mug
1895c11033
Rename postfix to suffix to match upstream
2023-05-06 13:18:25 +02:00
Mug
0e9f227afd
Update low level examples
2023-05-04 18:33:08 +02:00
Lucas Doyle
0fcc25cdac
examples fastapi_server: deprecate
...
This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more.
Rationale:
Currently there exist two server implementations in this repo:
- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around
IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.
The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
2023-05-01 22:34:23 -07:00
Mug
c39547a986
Detect multi-byte responses and wait
2023-04-28 12:50:30 +02:00
Mug
5f81400fcb
Also ignore errors on input prompts
2023-04-26 14:45:51 +02:00
Mug
3c130f00ca
Remove try catch from chat
2023-04-26 14:38:53 +02:00
Mug
c4a8491d42
Fix decode errors permanently
2023-04-26 14:37:06 +02:00
Mug
53d17ad003
Fixed end of text wrong type, and fix n_predict behaviour
2023-04-17 14:45:28 +02:00
Mug
3bb45f1658
More reasonable defaults
2023-04-10 16:38:45 +02:00
Mug
0cccb41a8f
Added iterative search to prevent instructions from being echoed, add ignore eos, add no-mmap, fixed 1 character echo too much bug
2023-04-10 16:35:38 +02:00
Andrei Betlen
196650ccb2
Update model paths to be more clear they should point to file
2023-04-09 22:45:55 -04:00
Andrei Betlen
6d1bda443e
Add clients example. Closes #46
2023-04-08 09:35:32 -04:00
Andrei
41365b0456
Merge pull request #15 from SagsMug/main
...
llama.cpp chat example implementation
2023-04-07 20:43:33 -04:00
Mug
16fc5b5d23
More interoperability to the original llama.cpp, and arguments now work
2023-04-07 13:32:19 +02:00
Mug
10c7571117
Fixed too many newlines, now onto args.
...
Still needs shipping work so you could do "python -m llama_cpp.examples." etc.
2023-04-06 15:33:22 +02:00
Mug
085cc92b1f
Better llama.cpp interoperability
...
Has some too many newline issues so WIP
2023-04-06 15:30:57 +02:00
MillionthOdin16
c283edd7f2
Set n_batch to default values and reduce thread count:
...
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.
Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
Andrei Betlen
e1b5b9bb04
Update fastapi server example
2023-04-05 14:44:26 -04:00
Mug
283e59c5e9
Fix bug in init_break not being set when exited via antiprompt and others.
2023-04-05 14:47:24 +02:00
Mug
99ceecfccd
Move to new examples directory
2023-04-05 14:28:02 +02:00
Mug
e4c6f34d95
Merge branch 'main' of https://github.com/abetlen/llama-cpp-python
2023-04-05 14:18:27 +02:00
Andrei Betlen
b1babcf56c
Add quantize example
2023-04-05 04:17:26 -04:00
Andrei Betlen
c8e13a78d0
Re-organize examples folder
2023-04-05 04:10:13 -04:00
Andrei Betlen
c16bda5fb9
Add performance tuning notebook
2023-04-05 04:09:19 -04:00
Mug
c862e8bac5
Fix repeating instructions and an antiprompt bug
2023-04-04 17:54:47 +02:00
Mug
9cde7973cc
Fix stripping instruction prompt
2023-04-04 16:20:27 +02:00
Mug
da5a6a7089
Added instruction mode, fixed infinite generation, and various other fixes
2023-04-04 16:18:26 +02:00
Mug
0b32bb3d43
Add instruction mode
2023-04-04 11:48:48 +02:00
Andrei Betlen
ffe34cf64d
Allow user to set llama config from env vars
2023-04-04 00:52:44 -04:00
Andrei Betlen
05eb2087d8
Small fixes for examples
2023-04-03 20:33:07 -04:00
Andrei Betlen
7fedf16531
Add support for chat completion
2023-04-03 20:12:44 -04:00
Andrei Betlen
f7ab8d55b2
Update context size defaults Close #11
2023-04-03 20:11:13 -04:00
Mug
f1615f05e6
Chat llama.cpp example implementation
2023-04-03 22:54:46 +02:00
Andrei Betlen
caff127836
Remove commented out code
2023-04-01 15:13:01 -04:00
Andrei Betlen
f28bf3f13d
Bugfix: enable embeddings for fastapi server
2023-04-01 15:12:25 -04:00
Andrei Betlen
ed6f2a049e
Add streaming and embedding endpoints to fastapi example
2023-04-01 13:05:20 -04:00
Andrei Betlen
9fac0334b2
Update embedding example to new api
2023-04-01 13:02:51 -04:00
Andrei Betlen
5e011145c5
Update low level api example
2023-04-01 13:02:10 -04:00
Andrei Betlen
5f2e822b59
Rename inference example
2023-04-01 13:01:45 -04:00
Andrei Betlen
70b8a1ef75
Add support to get embeddings from high-level api. Closes #4
2023-03-28 04:59:54 -04:00
Andrei Betlen
3dbb3fd3f6
Add support for stream parameter. Closes #1
2023-03-28 04:03:57 -04:00
Andrei Betlen
dfe8608096
Update examples
2023-03-24 19:10:31 -04:00
Andrei Betlen
a61fd3b509
Add example based on stripped down version of main.cpp from llama.cpp
2023-03-24 18:57:25 -04:00