Commit graph

1774 commits

Author SHA1 Message Date
Patrick Peng
b454f40a9a
Merge pull request from GHSA-56xg-wfcc-g829
Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-10 00:47:56 -04:00
Sigbjørn Skjæret
5ab40e6167
feat: Support multiple chat templates - step 1 (#1396)
* Support multiple chat templates - step 1

As a first step, allow user to to select template from metadata with chat_format parameter in the form of `chat_template.name`.

* register chat templates to self.chat_formats instead of globally

* Don't expose internal chat handlers yet

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-09 09:49:09 -04:00
Andrei Betlen
bf66a283e8 chore: Bump version 2024-05-09 03:02:52 -04:00
Andrei Betlen
3757328b70 fix: free last image embed in llava chat handler 2024-05-08 22:16:18 -04:00
Andrei Betlen
77122638b4 fix: Make leading bos_token optional for image chat formats, fix nanollava system message 2024-05-08 13:12:31 -04:00
Andrei Betlen
2a39b99575 feat: Update llama.cpp 2024-05-08 08:42:22 -04:00
Andrei Betlen
9ce5cb376a chore: Bump version 2024-05-08 02:36:42 -04:00
Sigbjørn Skjæret
4a7122d22f
feat: fill-in-middle support (#1386)
* Proper fill-in-middle support

Use prefix/middle/suffix tokens when metadata is present in GGUF, like f.ex. in [this](https://huggingface.co/CISCai/CodeQwen1.5-7B-Chat-SOTA-GGUF) one.

* fall back to internal prefix/middle/suffix id

In some cases llama.cpp will make a guess at fim tokens, use them if there's no metadata.

* typo--

* don't insert special tokens that are not there in suffix

Note: add_bos is misnamed, it's actually add_special and can cause several special tokens to be added to the token list (the special parameter is actually parse_special).

* don't add/parse any special tokens when using fim

I've left original behavior when no fim tokens are found, but this should perhaps be re-evaluated.

* don't append suffix to prompt_tokens unless fim tokens are detected

* make sure we only do this for fim

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-08 02:26:22 -04:00
Andrei Betlen
228949c1f7 feat: Update llama.cpp 2024-05-08 02:22:15 -04:00
Sarunas Kalade
903b28adf5
fix: adding missing args in create_completion for functionary chat handler (#1430) 2024-05-08 02:21:27 -04:00
Ikko Eltociear Ashimine
07966b9ba7
docs: update README.md (#1432)
accomodate -> accommodate
2024-05-08 02:20:20 -04:00
Bruno Alvisio
a50d24e3a7
fix: chat_format log where auto-detected format prints None (#1434) 2024-05-08 02:19:35 -04:00
Andrei Betlen
0318702cdc feat(server): Add support for setting root_path. Closes #1420 2024-05-05 12:49:31 -04:00
Olivier DEBAUCHE
3666833107
feat(ci): Add docker checks and check deps more frequently (#1426)
* Update dependabot.yml

Add github-actions update

* Update dependabot.yml

* Update dependabot.yml
2024-05-05 12:42:28 -04:00
Andrei Betlen
3e2597eac8 feat: Update llama.cpp 2024-05-05 12:12:27 -04:00
Noam Gat
e0d7674e62
fix: detokenization case where first token does not start with a leading space (#1375)
* Fix tokenization edge case where llama output does not start with a space

See this notebook:
https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC

* Update _internals.py

Fixing to compare to b' ' instead of (str)' '

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-04 10:14:59 -04:00
Jeffrey Fong
1f56c648c3
feat: Implement streaming for Functionary v2 + Bug fixes (#1419)
* set up streaming for v2

* assert v2 streaming, fix tool_call vs function_call

* fix streaming with tool_choice/function_call

* make functions return 1 function call only when 'auto'

* fix

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-05-04 10:11:20 -04:00
Andrei Betlen
f9b7221c8f Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2024-05-03 19:07:54 -04:00
Andrei Betlen
9f7a85571a fix: Use memmove to copy str_value kv_override. Closes #1417 2024-05-03 19:07:50 -04:00
Andrei Betlen
0a454bebe6 feat(server): Remove temperature bounds checks for server. Closes #1384 2024-05-03 15:23:06 -04:00
Daniel Thuerck
2138561fab
fix(server): Propagate flash_attn to model load. (#1424) 2024-05-03 12:17:07 -04:00
Andrei Betlen
2117122396 chore: Bump version 2024-05-02 12:07:09 -04:00
Andrei Betlen
d75dea18db feat: Update llama.cpp 2024-05-02 12:00:44 -04:00
Andrei Betlen
31b1d95a6c feat: Add llama-3-vision-alpha chat format 2024-05-02 11:32:18 -04:00
Andrei Betlen
4f01c452b6 fix: Change default verbose value of verbose in image chat format handlers to True to match Llama 2024-04-30 15:50:30 -04:00
Andrei Betlen
946156fb6c feat: Update llama.cpp 2024-04-30 15:46:45 -04:00
Andrei Betlen
9286b5caac Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2024-04-30 15:45:36 -04:00
Andrei Betlen
f116175a5a fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729 2024-04-30 15:45:34 -04:00
Jonathan Soma
3226b3c5ef
fix: UTF-8 handling with grammars (#1415)
Use Python's built-in UTF-8 handling to get code points
2024-04-30 14:33:23 -04:00
Andrei Betlen
945c62c567 docs: Change all examples from interpreter style to script style. 2024-04-30 10:15:04 -04:00
Andrei Betlen
26478ab293 docs: Update README.md 2024-04-30 10:11:38 -04:00
Andrei Betlen
b14dd98922 chore: Bump version 2024-04-30 09:39:56 -04:00
Andrei Betlen
29b6e9a5c8 fix: wrong parameter for flash attention in pickle __getstate__ 2024-04-30 09:32:47 -04:00
Andrei Betlen
22d77eefd2 feat: Add option to enable flash_attn to Lllama params and ModelSettings 2024-04-30 09:29:16 -04:00
Andrei Betlen
8c2b24d5aa feat: Update llama.cpp 2024-04-30 09:27:55 -04:00
Olivier DEBAUCHE
6332527a69
fix(ci): Fix build-and-release.yaml (#1413)
* Update build-and-release.yaml

* Update build-and-release.yaml
2024-04-30 09:16:14 -04:00
Andrei Betlen
c8cd8c17c6 docs: Update README to include CUDA 12.4 wheels 2024-04-30 03:12:46 -04:00
Andrei Betlen
f417cce28a chore: Bump version 2024-04-30 03:11:02 -04:00
Andrei Betlen
3489ef09d3 fix: Ensure image renders before text in chat formats regardless of message content order. 2024-04-30 03:08:46 -04:00
Andrei Betlen
d03f15bb73 fix(ci): Fix bug in use of upload-artifact failing to merge multiple artifacts into a single release. 2024-04-30 02:58:55 -04:00
Andrei Betlen
26c7876ba0 chore: Bump version 2024-04-30 01:48:40 -04:00
Andrei
fe2da09538
feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147)
* Test dummy image tags in chat templates

* Format and improve  types for llava_cpp.py

* Add from_pretrained support to llava chat format.

* Refactor llava chat format to use a jinja2

* Revert chat format test

* Add moondream support (wip)

* Update moondream chat format

* Update moondream chat format

* Update moondream prompt

* Add function calling support

* Cache last image embed

* Add Llava1.6 support

* Add nanollava support

* Add obisidian support

* Remove unnecessary import

* Re-order multimodal chat formats

* Logits all no longer required for multi-modal models

* Update README.md

* Update docs

* Update README

* Fix typo

* Update README

* Fix typo
2024-04-30 01:35:38 -04:00
Andrei Betlen
97fb860eba feat: Update llama.cpp 2024-04-29 23:34:55 -04:00
dependabot[bot]
df2b5b5d44
chore(deps): bump actions/upload-artifact from 3 to 4 (#1412)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3 to 4.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 22:53:42 -04:00
dependabot[bot]
be43018e09
chore(deps): bump actions/configure-pages from 4 to 5 (#1411)
Bumps [actions/configure-pages](https://github.com/actions/configure-pages) from 4 to 5.
- [Release notes](https://github.com/actions/configure-pages/releases)
- [Commits](https://github.com/actions/configure-pages/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/configure-pages
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 22:53:21 -04:00
dependabot[bot]
32c000f3ec
chore(deps): bump softprops/action-gh-release from 1 to 2 (#1408)
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 1 to 2.
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](https://github.com/softprops/action-gh-release/compare/v1...v2)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 22:52:58 -04:00
Olivier DEBAUCHE
03c654a3d9
ci(fix): Workflow actions updates and fix arm64 wheels not included in release (#1392)
* Update test.yaml

Bump  actions/checkout@v3 to v4
Bump action/setup-python@v4 to v5

* Update test-pypi.yaml

Bum actions/setup-python@v4 to v5

* Update build-and-release.yaml

Bump softprops/action-gh-release@v1 to v2
Bump actions/checkout@v3 to v4
Bump actions/setup-python@v3 to v5

* Update publish.yaml

Bump actions/checkout@v3 to v4
Bump actions/sertup-python@v4 to v5

* Update publish-to-test.yaml

Bump actions/checkout@v3 to v4
Bump actions/setup-python @v4 to v5

* Update test-pypi.yaml

Add Python 3.12

* Update build-and-release.yaml

* Update build-docker.yaml

Bump docker/setup-qemu-action@v2 to v3
Bump docker/setup-buildx-action@v2 to v3

* Update build-and-release.yaml

* Update build-and-release.yaml
2024-04-29 22:52:23 -04:00
Andrei Betlen
0c3bc4b928 fix(ci): Update generate wheel index script to include cu12.3 and cu12.4 Closes #1406 2024-04-29 12:37:22 -04:00
Olivier DEBAUCHE
2355ce2227
ci: Add support for pre-built cuda 12.4.1 wheels (#1388)
* Add support for cuda 12.4.1

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

Revert
2024-04-27 23:44:47 -04:00
Andrei Betlen
a411612b38 feat: Add support for str type kv_overrides 2024-04-27 23:42:19 -04:00