Commit graph

19 commits

Author SHA1 Message Date
Andrei Betlen
255e1b4495 feat: Update llama.cpp 2024-06-07 02:02:12 -04:00
Sigbjørn Skjæret
027f7bc678
fix: Avoid duplicate special tokens in chat formats (#1439)
* Templates sometimes have BOS in them, remove duplicate

* tokenize chat format prompts before completion

This is to ensure that we don't duplicate any special tokens.

Hopefully I amended the existing formats correctly?

* updated comment

* corrected a few

* add some missing internals

* proper bos/eos detection

* just let tokenizer do the job

* typo--

* align test with new response

* changed to a warning

* move to another PR

* Use python warnings module

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:15:41 -04:00
Andrei Betlen
af3ed503e9 fix: Use numpy recarray for candidates data, fixes bug with temp < 0 2024-06-01 18:09:24 -04:00
Noam Gat
e0d7674e62
fix: detokenization case where first token does not start with a leading space (#1375)
* Fix tokenization edge case where llama output does not start with a space

See this notebook:
https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC

* Update _internals.py

Fixing to compare to b' ' instead of (str)' '

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-04 10:14:59 -04:00
Andrei Betlen
f116175a5a fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729 2024-04-30 15:45:34 -04:00
Douglas Hanley
f6ed21f9a2
feat: Allow for possibly non-pooled embeddings (#1380)
* allow for possibly non-pooled embeddings

* add more to embeddings section in README.md

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-25 21:32:44 -04:00
Andrei Betlen
159cc4e5d9 feat: Update llama.cpp 2024-04-21 20:46:40 -04:00
Yuri Mikhailov
62aad610e1
fix: last tokens passing to sample_repetition_penalties function (#1295)
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-01 15:25:43 -04:00
Andrei Betlen
8c71725d53 fix: Remove deprecated cfg sampling functions 2024-02-28 14:37:07 -05:00
Andrei Betlen
cbbcd888af feat: Update llama.cpp 2024-02-25 20:52:14 -05:00
Andrei Betlen
b9aca612af misc: use typesafe byref for internal classes 2024-02-23 03:40:07 -05:00
Andrei Betlen
dd22010e85 fix: Raise exceptions when llama model or context fails to load 2024-02-22 00:09:45 -05:00
Andrei
7f51b6071f
feat(low-level-api): Improve API static type-safety and performance (#1205) 2024-02-21 16:25:38 -05:00
Douglas Hanley
d7a67917ba
feat: Support batch embeddings (#1186)
* handle batched embeddings

* fix normalization issue

* fix type hints, ensure no breaking changes to embed

* Clear kv cache / reset internal state after embedding complete

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-14 04:26:09 -05:00
Andrei Betlen
6943bab6d8 fix: destructor exception where internal classes are missing some uninitialized attributes 2024-02-14 03:38:41 -05:00
Andrei Betlen
59760c85ed fix: Use llama_log_callback to avoid suppress_stdout_stderr 2024-02-05 21:52:12 -05:00
Andrei
da003d8768
Automatically set chat format from gguf (#1110)
* Use jinja formatter to load chat format from gguf

* Fix off-by-one error in metadata loader

* Implement chat format auto-detection
2024-01-29 14:22:23 -05:00
Andrei Betlen
5a34c57e54 feat: Expose gguf model metadata in metadata property 2024-01-19 10:46:03 -05:00
Andrei Betlen
cc4630e66f Move helper classes to _internals submodule 2024-01-17 09:14:00 -05:00