Commit graph

1734 commits

Author SHA1 Message Date
Andrei Betlen
26c7876ba0 chore: Bump version 2024-04-30 01:48:40 -04:00
Andrei
fe2da09538
feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147)
* Test dummy image tags in chat templates

* Format and improve  types for llava_cpp.py

* Add from_pretrained support to llava chat format.

* Refactor llava chat format to use a jinja2

* Revert chat format test

* Add moondream support (wip)

* Update moondream chat format

* Update moondream chat format

* Update moondream prompt

* Add function calling support

* Cache last image embed

* Add Llava1.6 support

* Add nanollava support

* Add obisidian support

* Remove unnecessary import

* Re-order multimodal chat formats

* Logits all no longer required for multi-modal models

* Update README.md

* Update docs

* Update README

* Fix typo

* Update README

* Fix typo
2024-04-30 01:35:38 -04:00
Andrei Betlen
97fb860eba feat: Update llama.cpp 2024-04-29 23:34:55 -04:00
dependabot[bot]
df2b5b5d44
chore(deps): bump actions/upload-artifact from 3 to 4 (#1412)
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3 to 4.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v3...v4)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 22:53:42 -04:00
dependabot[bot]
be43018e09
chore(deps): bump actions/configure-pages from 4 to 5 (#1411)
Bumps [actions/configure-pages](https://github.com/actions/configure-pages) from 4 to 5.
- [Release notes](https://github.com/actions/configure-pages/releases)
- [Commits](https://github.com/actions/configure-pages/compare/v4...v5)

---
updated-dependencies:
- dependency-name: actions/configure-pages
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 22:53:21 -04:00
dependabot[bot]
32c000f3ec
chore(deps): bump softprops/action-gh-release from 1 to 2 (#1408)
Bumps [softprops/action-gh-release](https://github.com/softprops/action-gh-release) from 1 to 2.
- [Release notes](https://github.com/softprops/action-gh-release/releases)
- [Changelog](https://github.com/softprops/action-gh-release/blob/master/CHANGELOG.md)
- [Commits](https://github.com/softprops/action-gh-release/compare/v1...v2)

---
updated-dependencies:
- dependency-name: softprops/action-gh-release
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-29 22:52:58 -04:00
Olivier DEBAUCHE
03c654a3d9
ci(fix): Workflow actions updates and fix arm64 wheels not included in release (#1392)
* Update test.yaml

Bump  actions/checkout@v3 to v4
Bump action/setup-python@v4 to v5

* Update test-pypi.yaml

Bum actions/setup-python@v4 to v5

* Update build-and-release.yaml

Bump softprops/action-gh-release@v1 to v2
Bump actions/checkout@v3 to v4
Bump actions/setup-python@v3 to v5

* Update publish.yaml

Bump actions/checkout@v3 to v4
Bump actions/sertup-python@v4 to v5

* Update publish-to-test.yaml

Bump actions/checkout@v3 to v4
Bump actions/setup-python @v4 to v5

* Update test-pypi.yaml

Add Python 3.12

* Update build-and-release.yaml

* Update build-docker.yaml

Bump docker/setup-qemu-action@v2 to v3
Bump docker/setup-buildx-action@v2 to v3

* Update build-and-release.yaml

* Update build-and-release.yaml
2024-04-29 22:52:23 -04:00
Andrei Betlen
0c3bc4b928 fix(ci): Update generate wheel index script to include cu12.3 and cu12.4 Closes #1406 2024-04-29 12:37:22 -04:00
Olivier DEBAUCHE
2355ce2227
ci: Add support for pre-built cuda 12.4.1 wheels (#1388)
* Add support for cuda 12.4.1

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

* Update build-wheels-cuda.yaml

Revert
2024-04-27 23:44:47 -04:00
Andrei Betlen
a411612b38 feat: Add support for str type kv_overrides 2024-04-27 23:42:19 -04:00
Andrei Betlen
c9b85bf098 feat: Update llama.cpp 2024-04-27 23:41:54 -04:00
dependabot[bot]
c07db99e5b
chore(deps): bump pypa/cibuildwheel from 2.16.5 to 2.17.0 (#1401)
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.16.5 to 2.17.0.
- [Release notes](https://github.com/pypa/cibuildwheel/releases)
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md)
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.16.5...v2.17.0)

---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-27 20:51:13 -04:00
dependabot[bot]
7074c4d256
chore(deps): bump docker/build-push-action from 4 to 5 (#1400)
Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 4 to 5.
- [Release notes](https://github.com/docker/build-push-action/releases)
- [Commits](https://github.com/docker/build-push-action/compare/v4...v5)

---
updated-dependencies:
- dependency-name: docker/build-push-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-27 20:51:02 -04:00
dependabot[bot]
79318ba1d1
chore(deps): bump docker/login-action from 2 to 3 (#1399)
Bumps [docker/login-action](https://github.com/docker/login-action) from 2 to 3.
- [Release notes](https://github.com/docker/login-action/releases)
- [Commits](https://github.com/docker/login-action/compare/v2...v3)

---
updated-dependencies:
- dependency-name: docker/login-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-27 20:50:50 -04:00
dependabot[bot]
27038db3d6
chore(deps): bump actions/cache from 3.3.2 to 4.0.2 (#1398)
Bumps [actions/cache](https://github.com/actions/cache) from 3.3.2 to 4.0.2.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](https://github.com/actions/cache/compare/v3.3.2...v4.0.2)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-27 20:50:39 -04:00
dependabot[bot]
17bdfc818f
chore(deps): bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.4 (#1397)
Bumps [conda-incubator/setup-miniconda](https://github.com/conda-incubator/setup-miniconda) from 2.2.0 to 3.0.4.
- [Release notes](https://github.com/conda-incubator/setup-miniconda/releases)
- [Changelog](https://github.com/conda-incubator/setup-miniconda/blob/main/CHANGELOG.md)
- [Commits](https://github.com/conda-incubator/setup-miniconda/compare/v2.2.0...v3.0.4)

---
updated-dependencies:
- dependency-name: conda-incubator/setup-miniconda
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-27 20:50:28 -04:00
Jeffrey Fong
f178636e1b
fix: Functionary bug fixes (#1385)
* fix completion tokens tracking, prompt forming

* fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8

* Updated README

* fix for openai server compatibility

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-27 20:49:52 -04:00
iyubondyrev
e6bbfb863c
examples: fix quantize example (#1387)
@iyubondyrev thank you!
2024-04-27 20:48:47 -04:00
Olivier DEBAUCHE
c58b56123d
ci: Update action versions in build-wheels-metal.yaml (#1390)
* Bump actions/setup-python@v4 to v5

* Update build-wheels-metal.yaml

* Update build-wheels-metal.yaml

* Update build-wheels-metal.yaml
2024-04-27 20:47:49 -04:00
Olivier DEBAUCHE
9e7f738220
ci: Update dependabot.yml (#1391)
Add github-actions update
2024-04-27 20:47:07 -04:00
Andrei Betlen
65edc90671 chore: Bump version 2024-04-26 10:11:31 -04:00
Andrei Betlen
173ebc7878 fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings 2024-04-25 21:36:09 -04:00
Douglas Hanley
f6ed21f9a2
feat: Allow for possibly non-pooled embeddings (#1380)
* allow for possibly non-pooled embeddings

* add more to embeddings section in README.md

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-25 21:32:44 -04:00
Andrei Betlen
fcfea66857 fix: pydantic deprecation warning 2024-04-25 21:21:48 -04:00
Andrei Betlen
7f52335c50 feat: Update llama.cpp 2024-04-25 21:21:29 -04:00
Andrei Betlen
266abfc1a3 fix(ci): Fix metal tests as well 2024-04-25 03:09:46 -04:00
Andrei Betlen
de37420fcf fix(ci): Fix python macos test runners issue 2024-04-25 03:08:32 -04:00
Andrei Betlen
2a9979fce1 feat: Update llama.cpp 2024-04-25 02:48:26 -04:00
Andrei Betlen
c50d3300d2 chore: Bump version 2024-04-23 02:53:20 -04:00
Andrei Betlen
611781f531 ci: Build arm64 wheels. Closes #1342 2024-04-23 02:48:09 -04:00
Sean Bailey
53ebcc8bb5
feat(server): Provide ability to dynamically allocate all threads if desired using -1 (#1364) 2024-04-23 02:35:38 -04:00
Geza Velkey
507c1da066
fix: Update scikit-build-core build dependency avoid bug in 0.9.1 (#1370)
cmake [options] <path-to-source>
  cmake [options] <path-to-existing-build>
  cmake [options] -S <path-to-source> -B <path-to-build>

Specify a source directory to (re-)generate a build system for it in the
current working directory.  Specify an existing build directory to
re-generate its build system.

Run 'cmake --help' for more information. issue
2024-04-23 02:34:15 -04:00
abk16
8559e8ce88
feat: Add Llama-3 chat format (#1371)
* feat: Add Llama-3 chat format

* feat: Auto-detect Llama-3 chat format from gguf template

* feat: Update llama.cpp to b2715

Includes proper Llama-3 <|eot_id|> token handling.

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-23 02:33:29 -04:00
Andrei Betlen
617d536e1c feat: Update llama.cpp 2024-04-23 02:31:40 -04:00
Andrei Betlen
d40a250ef3 feat: Use new llama_token_is_eog in create_completions 2024-04-22 00:35:47 -04:00
Andrei Betlen
b21ba0e2ac Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-04-21 20:46:42 -04:00
Andrei Betlen
159cc4e5d9 feat: Update llama.cpp 2024-04-21 20:46:40 -04:00
Andrei Betlen
0281214863 chore: Bump version 2024-04-20 00:09:37 -04:00
Andrei Betlen
cc81afebf0 feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct 2024-04-20 00:00:53 -04:00
Andrei Betlen
d17c1887a3 feat: Update llama.cpp 2024-04-19 23:58:16 -04:00
Andrei Betlen
893a27a736 chore: Bump version 2024-04-18 01:43:39 -04:00
Andrei Betlen
a128c80500 feat: Update llama.cpp 2024-04-18 01:39:45 -04:00
Lucca Zenóbio
4f42664955
feat: update grammar schema converter to match llama.cpp (#1353)
* feat: improve function calling

* feat:grammar

* fix

* fix

* fix
2024-04-18 01:36:25 -04:00
Andrei Betlen
fa4bb0cf81 Revert "feat: Update json to grammar (#1350)"
This reverts commit 610a592f70.
2024-04-17 16:18:16 -04:00
Lucca Zenóbio
610a592f70
feat: Update json to grammar (#1350)
* feat: improve function calling

* feat:grammar
2024-04-17 10:10:21 -04:00
khimaros
b73c73c0c6
feat: add disable_ping_events flag (#1257)
for backward compatibility, this is false by default

it can be set to true to disable EventSource pings
which are not supported by some OpenAI clients.

fixes https://github.com/abetlen/llama-cpp-python/issues/1256
2024-04-17 10:08:19 -04:00
tc-wolf
4924455dec
feat: Make saved state more compact on-disk (#1296)
* State load/save changes

- Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)`
  sized array.
  - Difference between ~350MB and ~1500MB for example prompt with ~300
    tokens (makes sense lol)
- Auto-formatting changes

* Back out formatting changes
2024-04-17 10:06:50 -04:00
Andrei Betlen
9842cbf99d feat: Update llama.cpp 2024-04-17 10:06:15 -04:00
ddh0
c96b2daebf feat: Use all available CPUs for batch processing (#1345) 2024-04-17 10:05:54 -04:00
Andrei Betlen
a420f9608b feat: Update llama.cpp 2024-04-14 19:14:09 -04:00