Commit graph

72 commits

Author SHA1 Message Date
Andrei Betlen
85ead98a3e Update Functions notebook example 2023-11-10 12:49:14 -05:00
Andrei Betlen
1b376c62b7 Update functionary for new OpenAI API 2023-11-10 02:51:58 -05:00
Andrei Betlen
598780fde8 Update Multimodal notebook 2023-11-08 00:48:25 -05:00
Damian Stewart
aab74f0b2b
Multimodal Support (Llava 1.5) (#821)
* llava v1.5 integration

* Point llama.cpp to fork

* Add llava shared library target

* Fix type

* Update llama.cpp

* Add llava api

* Revert changes to llama and llama_cpp

* Update llava example

* Add types for new gpt-4-vision-preview api

* Fix typo

* Update llama.cpp

* Update llama_types to match OpenAI v1 API

* Update ChatCompletionFunction type

* Reorder request parameters

* More API type fixes

* Even More Type Updates

* Add parameter for custom chat_handler to Llama class

* Fix circular import

* Convert to absolute imports

* Fix

* Fix pydantic Jsontype bug

* Accept list of prompt tokens in create_completion

* Add llava1.5 chat handler

* Add Multimodal notebook

* Clean up examples

* Add server docs

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-11-07 22:48:51 -05:00
Andrei
3af7b21ff1
Add functionary support (#784)
* Add common grammars and json-schema-to-grammar utility function from llama.cpp

* Pass functions to format function

* Add basic functionary formatting

* Add LlamaChatHandler for more complex chat use cases

* Add function calling example notebook

* Add support for regular chat completions alongside function calling
2023-11-03 02:12:14 -04:00
Andrei
ab028cb878
Migrate inference to llama_batch and llama_decode api (#795)
* Add low-level batching notebook

* fix: tokenization of special characters: (#850)

It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly

* Update CHANGELOG

* Cleanup

* Fix runner label

* Update notebook

* Use llama_decode and batch api

* Support logits_all parameter

---------

Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>
2023-11-02 20:13:57 -04:00
Andrei Betlen
f4090a0bb2 Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs. 2023-09-13 23:00:43 -04:00
Juarez Bochi
20ac434d0f
Fix low level api examples 2023-09-07 17:50:47 -04:00
Andrei
2adf6f3f9a
Merge pull request #265 from dmahurin/fix-from-bytes-byteorder
fix "from_bytes() missing required argument 'byteorder'"
2023-05-26 12:53:06 -04:00
Andrei
34ad71f448
Merge pull request #274 from dmahurin/fix-missing-antiprompt
low_level_api_chat_cpp.py: Fix missing antiprompt output in chat.
2023-05-26 12:52:34 -04:00
Don Mahurin
0fa2ec4903 low_level_api_chat_cpp.py: Fix missing antiprompt output in chat. 2023-05-26 06:54:28 -07:00
Don Mahurin
d6a7adb17a fix "missing 1 required positional argument: 'min_keep'" 2023-05-23 06:42:22 -07:00
Don Mahurin
327eedbfe1 fix "from_bytes() missing required argument 'byteorder'" 2023-05-23 00:20:34 -07:00
Andrei Betlen
c7788c85ab Add Guidance example 2023-05-19 03:16:58 -04:00
Andrei
7499fc1cbb
Merge pull request #126 from Stonelinks/deprecate-example-server
Deprecate example server
2023-05-08 19:29:04 -04:00
Mug
eaf9f19aa9 Fix lora 2023-05-08 15:27:42 +02:00
Mug
2c0d9b182c Fix session loading and saving in low level example chat 2023-05-08 15:27:03 +02:00
Mug
fd80ddf703 Fix a bug with wrong type 2023-05-06 22:22:28 +02:00
Mug
996f63e9e1 Add utf8 to chat example 2023-05-06 15:16:58 +02:00
Mug
3ceb47b597 Fix mirastat requiring c_float 2023-05-06 13:35:50 +02:00
Mug
9797394c81 Wrong logit_bias parsed type 2023-05-06 13:27:52 +02:00
Mug
1895c11033 Rename postfix to suffix to match upstream 2023-05-06 13:18:25 +02:00
Mug
0e9f227afd Update low level examples 2023-05-04 18:33:08 +02:00
Lucas Doyle
0fcc25cdac examples fastapi_server: deprecate
This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more.

Rationale:

Currently there exist two server implementations in this repo:

- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around

IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.

The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
2023-05-01 22:34:23 -07:00
Mug
c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Mug
5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug
3c130f00ca Remove try catch from chat 2023-04-26 14:38:53 +02:00
Mug
c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Mug
53d17ad003 Fixed end of text wrong type, and fix n_predict behaviour 2023-04-17 14:45:28 +02:00
Mug
3bb45f1658 More reasonable defaults 2023-04-10 16:38:45 +02:00
Mug
0cccb41a8f Added iterative search to prevent instructions from being echoed, add ignore eos, add no-mmap, fixed 1 character echo too much bug 2023-04-10 16:35:38 +02:00
Andrei Betlen
196650ccb2 Update model paths to be more clear they should point to file 2023-04-09 22:45:55 -04:00
Andrei Betlen
6d1bda443e Add clients example. Closes #46 2023-04-08 09:35:32 -04:00
Andrei
41365b0456
Merge pull request #15 from SagsMug/main
llama.cpp chat example implementation
2023-04-07 20:43:33 -04:00
Mug
16fc5b5d23 More interoperability to the original llama.cpp, and arguments now work 2023-04-07 13:32:19 +02:00
Mug
10c7571117 Fixed too many newlines, now onto args.
Still needs shipping work so you could do "python -m llama_cpp.examples." etc.
2023-04-06 15:33:22 +02:00
Mug
085cc92b1f Better llama.cpp interoperability
Has some too many newline issues so WIP
2023-04-06 15:30:57 +02:00
MillionthOdin16
c283edd7f2 Set n_batch to default values and reduce thread count:
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.

Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
Andrei Betlen
e1b5b9bb04 Update fastapi server example 2023-04-05 14:44:26 -04:00
Mug
283e59c5e9 Fix bug in init_break not being set when exited via antiprompt and others. 2023-04-05 14:47:24 +02:00
Mug
99ceecfccd Move to new examples directory 2023-04-05 14:28:02 +02:00
Mug
e4c6f34d95 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-05 14:18:27 +02:00
Andrei Betlen
b1babcf56c Add quantize example 2023-04-05 04:17:26 -04:00
Andrei Betlen
c8e13a78d0 Re-organize examples folder 2023-04-05 04:10:13 -04:00
Andrei Betlen
c16bda5fb9 Add performance tuning notebook 2023-04-05 04:09:19 -04:00
Mug
c862e8bac5 Fix repeating instructions and an antiprompt bug 2023-04-04 17:54:47 +02:00
Mug
9cde7973cc Fix stripping instruction prompt 2023-04-04 16:20:27 +02:00
Mug
da5a6a7089 Added instruction mode, fixed infinite generation, and various other fixes 2023-04-04 16:18:26 +02:00
Mug
0b32bb3d43 Add instruction mode 2023-04-04 11:48:48 +02:00
Andrei Betlen
ffe34cf64d Allow user to set llama config from env vars 2023-04-04 00:52:44 -04:00