Commit graph

40 commits

Author SHA1 Message Date
Andrei Betlen
cbbcd888af feat: Update llama.cpp 2024-02-25 20:52:14 -05:00
Andrei
7f51b6071f
feat(low-level-api): Improve API static type-safety and performance (#1205) 2024-02-21 16:25:38 -05:00
Andrei Betlen
2ef7ba3aed misc: rename grammar test 2024-02-08 01:07:44 -05:00
Andrei
fb762a6041
Add speculative decoding (#1120)
* Add draft model param to llama class, implement basic prompt lookup decoding draft model

* Use samplingcontext for sampling

* Use 1d array

* Use draft model for sampling

* Fix dumb mistake

* Allow for later extensions to the LlamaDraftModel api

* Cleanup

* Adaptive candidate prediction

* Update implementation to match hf transformers

* Tuning

* Fix bug where last token was not used for ngram prediction

* Remove heuristic for num_pred_tokens (no benefit)

* fix: n_candidates bug.

* Add draft_model_num_pred_tokens server setting

* Cleanup

* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
9ae5819ee4 Add chat format test. 2024-01-29 00:59:01 -05:00
Andrei Betlen
d3f5528ca8 fix: from_json_schema oneof/anyof bug. Closes #1097 2024-01-21 19:06:53 -05:00
Andrei Betlen
b8fc1c7d83 feat: Add ability to load chat format from huggingface autotokenizer or tokenizer_config.json files. 2024-01-18 21:21:37 -05:00
Austin
6bfe98bd80
Integration of Jinja2 Templating (#875)
* feat: Add support for jinja templating

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* fix: Refactor chat formatter and update interface for jinja templates

- Simplify the `llama2_template` in `llama_jinja_format.py` by removing unnecessary line breaks for readability without affecting functionality.
- Update `ChatFormatterInterface` constructor to accept a more generic `Optional[object]` type for the template parameter, enhancing flexibility.
- Introduce a `template` property to `ChatFormatterInterface` for standardized access to the template string.
- Replace `MetaSingleton` metaclass with `Singleton` for the `ChatFormatterFactory` to streamline the singleton implementation.

These changes enhance code readability, maintain usability, and ensure consistency in the chat formatter's design pattern usage.

* Add outline for Jinja2 templating integration documentation

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Add jinja2 as a dependency with version range for Hugging Face transformers compatibility

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Update jinja2 version constraint for mkdocs-material compatibility

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>

* Fix attribute name in AutoChatFormatter

- Changed attribute name from `self._renderer` to `self._environment`

---------

Signed-off-by: teleprint-me <77757836+teleprint-me@users.noreply.github.com>
2024-01-17 09:47:52 -05:00
Mark Neumann
c689ccc728
Fix Pydantic model parsing (#1087) 2024-01-15 10:45:57 -05:00
kddubey
5a8944672f
Fix logits_to_logprobs for 2-D and 3-D logits (#1002)
* Fix logits_to_logprobs for 2-D and 3-D logits

* Set dtype to single

* Test size
2023-12-16 18:59:26 -05:00
Andrei Betlen
9515467439 tests: add mock_kv_cache placeholder functions 2023-11-22 06:02:21 -05:00
Andrei Betlen
0ea244499e tests: avoid constantly reallocating logits 2023-11-22 04:31:05 -05:00
Andrei Betlen
0a7e05bc10 tests: don't mock sampling functions 2023-11-22 04:12:32 -05:00
Andrei Betlen
d7388f1ffb Use mock_llama for all tests 2023-11-21 18:13:19 -05:00
Maarten ter Huurne
c21edb6908
Do not set grammar to None for new LlamaGrammar objects (#834)
* Do not set `grammar` to `None` for new `LlamaGrammar` objects

The `grammar` attribute is written by `init()`, but that method always
returns `None`, so `__init__()` would then discard the previously
written object.

* Add minimal test for grammar parsing
2023-11-21 00:23:18 -05:00
Andrei Betlen
3dc21b2557 tests: Improve llama.cpp mock 2023-11-20 23:23:18 -05:00
Andrei Betlen
2c2afa320f Update llama.cpp 2023-11-20 14:11:33 -05:00
Andrei Betlen
e32ecb0516 Fix tests 2023-11-10 05:39:42 -05:00
Andrei Betlen
e214a58422 Refactor Llama class internals 2023-11-06 09:16:36 -05:00
Andrei
ab028cb878
Migrate inference to llama_batch and llama_decode api (#795)
* Add low-level batching notebook

* fix: tokenization of special characters: (#850)

It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly

* Update CHANGELOG

* Cleanup

* Fix runner label

* Update notebook

* Use llama_decode and batch api

* Support logits_all parameter

---------

Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>
2023-11-02 20:13:57 -04:00
Antoine Lizee
4d4e0f11e2 fix: tokenization of special characters: (#850)
It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly
2023-11-02 14:28:14 -04:00
Andrei Betlen
ef03d77b59 Enable finish reason tests 2023-10-19 02:56:45 -04:00
Andrei Betlen
cbeef36510 Re-enable tests completion function 2023-10-19 02:55:29 -04:00
Andrei Betlen
1a1c3dc418 Update llama.cpp 2023-09-28 22:42:03 -04:00
janvdp
f49b6d7c67 add test to see if llama_cpp.__version__ exists 2023-09-05 21:10:05 +02:00
Andrei Betlen
4887973c22 Update llama.cpp 2023-08-27 12:59:20 -04:00
Andrei Betlen
3a29d65f45 Update llama.cpp 2023-08-26 23:36:24 -04:00
Andrei Betlen
8ac59465b9 Strip leading space when de-tokenizing. 2023-08-25 04:56:48 -04:00
Andrei Betlen
3674e5ed4e Update model path 2023-08-24 01:01:20 -04:00
Andrei Betlen
01a010be52 Fix llama_cpp and Llama type signatures. Closes #221 2023-05-19 11:59:33 -04:00
Andrei Betlen
46e3c4b84a Fix 2023-05-01 22:41:54 -04:00
Andrei Betlen
9eafc4c49a Refactor server to use factory 2023-05-01 22:38:46 -04:00
Andrei Betlen
c088a2b3a7 Un-skip tests 2023-05-01 15:46:03 -04:00
Andrei Betlen
2f8a3adaa4 Temporarily skip sampling tests. 2023-05-01 15:01:49 -04:00
Lucas Doyle
efe8e6f879 llama_cpp server: slight refactor to init_llama function
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py

This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle
6d8db9d017 tests: simple test for server module 2023-04-29 11:42:20 -07:00
Mug
18a0c10032 Remove excessive errors="ignore" and add utf8 test 2023-04-29 12:19:22 +02:00
Mug
5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Andrei Betlen
e96a5c5722 Make Llama instance pickleable. Closes #27 2023-04-05 06:52:17 -04:00
Andrei Betlen
c3972b61ae Add basic tests. Closes #24 2023-04-05 03:23:15 -04:00