Commit graph

130 commits

Author SHA1 Message Date
Andrei Betlen
909ef66951 docs: Rename cuBLAS section to CUDA 2024-04-04 03:08:47 -04:00
Andrei Betlen
1db3b58fdc docs: Add docs explaining how to install pre-built wheels. 2024-04-04 02:57:06 -04:00
Andrei Betlen
c50309e52a docs: LLAMA_CUBLAS -> LLAMA_CUDA 2024-04-04 02:49:19 -04:00
Andrei
5a930ee9a1
feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247)
* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name
2024-04-03 15:32:13 -04:00
lawfordp2017
a0f373e310
fix: Changed local API doc references to hosted (#1317) 2024-04-01 10:21:00 -04:00
Kenneth Hoste
663659f730
docs: fix small typo in README: 'model know how' -> 'model knows how' (#1244)
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-02 22:20:41 -05:00
Andrei Betlen
97aa3a153d docs: Add information re: auto chat formats. Closes #1236 2024-03-01 13:10:25 -05:00
Douglas Hanley
cf1fdd8a9a
docs: fix typo in README.md embeddings example. (#1232) 2024-02-29 13:55:50 -05:00
Andrei
4d574bd765
feat(server): Add support for pulling models from Huggingface Hub (#1222)
* Basic support for hf pull on server

* Add hf_model_repo_id setting

* Update README
2024-02-26 14:35:08 -05:00
Andrei Betlen
b3e358dee4 docs: Add example of local image loading to README 2024-02-26 11:58:33 -05:00
Andrei Betlen
b681674bf2 docs: Fix functionary repo_id 2024-02-23 12:36:13 -05:00
Andrei Betlen
702306b381 docs: Restore functionary docs in README 2024-02-23 12:34:02 -05:00
Aditya Purandare
52d9d70076
docs: Update README.md to fix pip install llama cpp server (#1187)
Without the single quotes, when running the command, an error is printed saying no matching packages found on pypi. Adding the quotes fixes it

```bash
$ pip install llama-cpp-python[server]
zsh: no matches found: llama-cpp-python[server]
```

Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-23 04:41:22 -05:00
Andrei Betlen
410e02da51 docs: Fix typo 2024-02-23 00:43:31 -05:00
Andrei Betlen
eb56ce2e2a docs: fix low-level api example 2024-02-22 11:33:05 -05:00
Andrei Betlen
0f8cad6cb7 docs: Update README 2024-02-22 11:31:44 -05:00
Andrei Betlen
045cc12670 docs: Update README 2024-02-22 03:53:52 -05:00
Andrei Betlen
32efed7b07 docs: Update README 2024-02-22 03:25:11 -05:00
Andrei Betlen
d80c5cf29d docs: fix indentation for mkdocs-material 2024-02-22 02:30:24 -05:00
Andrei
0f8aa4ab5c
feat: Pull models directly from huggingface (#1206)
* Add from_pretrained method to Llama class

* Update docs

* Merge filename and pattern
2024-02-21 16:25:10 -05:00
Andrei Betlen
c2a234a086 docs: Add embeddings section 2024-02-15 23:15:50 -05:00
Andrei Betlen
4348a6cdf0 docs: Fix typo 2024-02-13 02:04:54 -05:00
Andrei Betlen
b82b0e1014 docs: Temporarily revert function calling docs 2024-02-12 16:27:43 -05:00
Akarshan Biswas
918ff27e50
docs: Set the correct command for compiling with syscl support (#1172) 2024-02-11 13:55:15 -05:00
Jeffrey Fong
901827013b
feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class (#1078)
* convert functionary-v1 chat handler to use hf autotokenizer

* add hf_tokenizer + inteegrate functionary-v1.4 prompt template

* integrate functionary v2 prompt template

* update readme

* set up parallel function calling wip

* set up parallel function calling

* Update README.md

* Update README.md

* refactor tokenizers

* include old functionary handler for backward compatibility

* add hf_tokenizer_path in server ModelSettings

* convert functionary-v1 chat handler to use hf autotokenizer

* add hf_tokenizer + inteegrate functionary-v1.4 prompt template

* integrate functionary v2 prompt template

* update readme

* set up parallel function calling wip

* resolve merge conflict

* Update README.md

* Update README.md

* refactor tokenizers

* include old functionary handler for backward compatibility

* add hf_tokenizer_path in server ModelSettings

* Cleanup PR, fix breaking changes

* Use hf_pretrained_model_name_or_path for tokenizer

* fix hf tokenizer in streaming

* update README

* refactor offset mapping

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-07 20:07:03 -05:00
Andrei
fb762a6041
Add speculative decoding (#1120)
* Add draft model param to llama class, implement basic prompt lookup decoding draft model

* Use samplingcontext for sampling

* Use 1d array

* Use draft model for sampling

* Fix dumb mistake

* Allow for later extensions to the LlamaDraftModel api

* Cleanup

* Adaptive candidate prediction

* Update implementation to match hf transformers

* Tuning

* Fix bug where last token was not used for ngram prediction

* Remove heuristic for num_pred_tokens (no benefit)

* fix: n_candidates bug.

* Add draft_model_num_pred_tokens server setting

* Cleanup

* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
247a16de66 docs: Update README 2024-01-30 12:23:07 -05:00
Andrei Betlen
059f6b3ac8 docs: fix typos 2024-01-29 11:02:25 -05:00
Andrei Betlen
843e77e3e2 docs: Add Vulkan build instructions 2024-01-29 11:01:26 -05:00
Andrei Betlen
8c59210062 docs: Fix typo 2024-01-27 19:37:59 -05:00
Andrei Betlen
399fa1e03b docs: Add JSON and JSON schema mode examples to README 2024-01-27 19:36:33 -05:00
Andrei Betlen
d6fb16e055 docs: Update README 2024-01-25 10:51:48 -05:00
Andrei Betlen
5b258bf840 docs: Update README with more param common examples 2024-01-24 10:51:15 -05:00
Andrei Betlen
88fbccaaa3 docs: Add macosx wrong arch fix to README 2024-01-21 18:38:44 -05:00
Jerry Liu
84380fe9a6
Add llamaindex integration to readme (#1092) 2024-01-16 19:10:50 -05:00
Caleb Hoff
f766b70c9a
Fix: Correct typo in README.md (#1058)
In Llama.create_chat_completion, the `tool_choice` property does not have an s on the end.
2024-01-04 18:12:32 -05:00
Andrei Betlen
f4be84c122 Fix typo 2023-12-22 14:40:44 -05:00
Andrei Betlen
9b3a5939f3 docs: Add multi-model link to readme 2023-12-22 14:40:13 -05:00
evelynmitchell
37da8e863a
Update README.md functionary demo typo (#996)
missing comma
2023-12-16 19:00:30 -05:00
zocainViken
6bbeea07ae
README.md multimodal params fix (#967)
multi modal params fix: add logits = True -> to make llava work
2023-12-11 20:41:38 -05:00
Aniket Maurya
c1d92ce680
fix minor typo (#958)
* fix minor typo

* Fix typo

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2023-12-11 20:40:38 -05:00
Andrei Betlen
fb32f9d438 docs: Update README 2023-11-28 03:15:01 -05:00
Andrei Betlen
43e006a291 docs: Remove divider 2023-11-28 02:41:50 -05:00
Andrei Betlen
2cc6c9ae2f docs: Update README, add FAQ 2023-11-28 02:37:34 -05:00
Andrei Betlen
9c68b1804a docs: Add api reference links in README 2023-11-27 18:54:07 -05:00
Andrei Betlen
41428244f0 docs: Fix README indentation 2023-11-27 18:29:13 -05:00
Andrei Betlen
1539146a5e docs: Fix README docs link 2023-11-27 18:21:00 -05:00
Anton Vice
aa5a7a1880
Update README.md (#940)
.ccp >> .cpp
2023-11-26 15:39:38 -05:00
Andrei Betlen
abb1976ad7 docs: Add n_ctx not for multimodal models 2023-11-22 21:07:00 -05:00
Andrei Betlen
36679a58ef Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-11-22 19:49:59 -05:00