nullname
d634efcdd9
feat: adding rpc_servers
parameter to Llama
class ( #1477 )
...
* passthru rpc_servers params
wip
* enable llama rpc by default
* convert string to byte
* add rpc package
* Revert "enable llama rpc by default"
This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790.
* update readme
* Only set rpc_servers when provided
* Add rpc servers to server options
---------
Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-06-04 10:38:21 -04:00
Andrei Betlen
2d89964147
docs: Fix table formatting
2024-05-24 11:55:41 -04:00
Andrei Betlen
9e8d7d55bd
fix(docs): Fix link typo
2024-05-24 11:55:01 -04:00
Andrei Betlen
ec43e8920f
docs: Update multi-modal model section
2024-05-24 11:54:15 -04:00
Peng Yu
1547202b77
docs: Fix typo in README.md ( #1444 )
2024-05-10 10:35:51 -04:00
Ikko Eltociear Ashimine
07966b9ba7
docs: update README.md ( #1432 )
...
accomodate -> accommodate
2024-05-08 02:20:20 -04:00
Andrei Betlen
945c62c567
docs: Change all examples from interpreter style to script style.
2024-04-30 10:15:04 -04:00
Andrei Betlen
26478ab293
docs: Update README.md
2024-04-30 10:11:38 -04:00
Andrei Betlen
c8cd8c17c6
docs: Update README to include CUDA 12.4 wheels
2024-04-30 03:12:46 -04:00
Andrei
fe2da09538
feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) ( #1147 )
...
* Test dummy image tags in chat templates
* Format and improve types for llava_cpp.py
* Add from_pretrained support to llava chat format.
* Refactor llava chat format to use a jinja2
* Revert chat format test
* Add moondream support (wip)
* Update moondream chat format
* Update moondream chat format
* Update moondream prompt
* Add function calling support
* Cache last image embed
* Add Llava1.6 support
* Add nanollava support
* Add obisidian support
* Remove unnecessary import
* Re-order multimodal chat formats
* Logits all no longer required for multi-modal models
* Update README.md
* Update docs
* Update README
* Fix typo
* Update README
* Fix typo
2024-04-30 01:35:38 -04:00
Jeffrey Fong
f178636e1b
fix: Functionary bug fixes ( #1385 )
...
* fix completion tokens tracking, prompt forming
* fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8
* Updated README
* fix for openai server compatibility
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-27 20:49:52 -04:00
Douglas Hanley
f6ed21f9a2
feat: Allow for possibly non-pooled embeddings ( #1380 )
...
* allow for possibly non-pooled embeddings
* add more to embeddings section in README.md
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-25 21:32:44 -04:00
Sigbjørn Skjæret
7265a5dc0e
fix(docs): incorrect tool_choice example ( #1330 )
2024-04-05 09:14:03 -04:00
Andrei Betlen
909ef66951
docs: Rename cuBLAS section to CUDA
2024-04-04 03:08:47 -04:00
Andrei Betlen
1db3b58fdc
docs: Add docs explaining how to install pre-built wheels.
2024-04-04 02:57:06 -04:00
Andrei Betlen
c50309e52a
docs: LLAMA_CUBLAS -> LLAMA_CUDA
2024-04-04 02:49:19 -04:00
Andrei
5a930ee9a1
feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal ( #1247 )
...
* Generate binary wheel index on release
* Add total release downloads badge
* Update download label
* Use official cibuildwheel action
* Add workflows to build CUDA and Metal wheels
* Update generate index workflow
* Update workflow name
2024-04-03 15:32:13 -04:00
lawfordp2017
a0f373e310
fix: Changed local API doc references to hosted ( #1317 )
2024-04-01 10:21:00 -04:00
Kenneth Hoste
663659f730
docs: fix small typo in README: 'model know how' -> 'model knows how' ( #1244 )
...
Co-authored-by: Andrei <abetlen@gmail.com>
2024-03-02 22:20:41 -05:00
Andrei Betlen
97aa3a153d
docs: Add information re: auto chat formats. Closes #1236
2024-03-01 13:10:25 -05:00
Douglas Hanley
cf1fdd8a9a
docs: fix typo in README.md embeddings example. ( #1232 )
2024-02-29 13:55:50 -05:00
Andrei
4d574bd765
feat(server): Add support for pulling models from Huggingface Hub ( #1222 )
...
* Basic support for hf pull on server
* Add hf_model_repo_id setting
* Update README
2024-02-26 14:35:08 -05:00
Andrei Betlen
b3e358dee4
docs: Add example of local image loading to README
2024-02-26 11:58:33 -05:00
Andrei Betlen
b681674bf2
docs: Fix functionary repo_id
2024-02-23 12:36:13 -05:00
Andrei Betlen
702306b381
docs: Restore functionary docs in README
2024-02-23 12:34:02 -05:00
Aditya Purandare
52d9d70076
docs: Update README.md to fix pip install llama cpp server ( #1187 )
...
Without the single quotes, when running the command, an error is printed saying no matching packages found on pypi. Adding the quotes fixes it
```bash
$ pip install llama-cpp-python[server]
zsh: no matches found: llama-cpp-python[server]
```
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-23 04:41:22 -05:00
Andrei Betlen
410e02da51
docs: Fix typo
2024-02-23 00:43:31 -05:00
Andrei Betlen
eb56ce2e2a
docs: fix low-level api example
2024-02-22 11:33:05 -05:00
Andrei Betlen
0f8cad6cb7
docs: Update README
2024-02-22 11:31:44 -05:00
Andrei Betlen
045cc12670
docs: Update README
2024-02-22 03:53:52 -05:00
Andrei Betlen
32efed7b07
docs: Update README
2024-02-22 03:25:11 -05:00
Andrei Betlen
d80c5cf29d
docs: fix indentation for mkdocs-material
2024-02-22 02:30:24 -05:00
Andrei
0f8aa4ab5c
feat: Pull models directly from huggingface ( #1206 )
...
* Add from_pretrained method to Llama class
* Update docs
* Merge filename and pattern
2024-02-21 16:25:10 -05:00
Andrei Betlen
c2a234a086
docs: Add embeddings section
2024-02-15 23:15:50 -05:00
Andrei Betlen
4348a6cdf0
docs: Fix typo
2024-02-13 02:04:54 -05:00
Andrei Betlen
b82b0e1014
docs: Temporarily revert function calling docs
2024-02-12 16:27:43 -05:00
Akarshan Biswas
918ff27e50
docs: Set the correct command for compiling with syscl support ( #1172 )
2024-02-11 13:55:15 -05:00
Jeffrey Fong
901827013b
feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class ( #1078 )
...
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* set up parallel function calling
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* convert functionary-v1 chat handler to use hf autotokenizer
* add hf_tokenizer + inteegrate functionary-v1.4 prompt template
* integrate functionary v2 prompt template
* update readme
* set up parallel function calling wip
* resolve merge conflict
* Update README.md
* Update README.md
* refactor tokenizers
* include old functionary handler for backward compatibility
* add hf_tokenizer_path in server ModelSettings
* Cleanup PR, fix breaking changes
* Use hf_pretrained_model_name_or_path for tokenizer
* fix hf tokenizer in streaming
* update README
* refactor offset mapping
---------
Co-authored-by: Andrei <abetlen@gmail.com>
2024-02-07 20:07:03 -05:00
Andrei
fb762a6041
Add speculative decoding ( #1120 )
...
* Add draft model param to llama class, implement basic prompt lookup decoding draft model
* Use samplingcontext for sampling
* Use 1d array
* Use draft model for sampling
* Fix dumb mistake
* Allow for later extensions to the LlamaDraftModel api
* Cleanup
* Adaptive candidate prediction
* Update implementation to match hf transformers
* Tuning
* Fix bug where last token was not used for ngram prediction
* Remove heuristic for num_pred_tokens (no benefit)
* fix: n_candidates bug.
* Add draft_model_num_pred_tokens server setting
* Cleanup
* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
247a16de66
docs: Update README
2024-01-30 12:23:07 -05:00
Andrei Betlen
059f6b3ac8
docs: fix typos
2024-01-29 11:02:25 -05:00
Andrei Betlen
843e77e3e2
docs: Add Vulkan build instructions
2024-01-29 11:01:26 -05:00
Andrei Betlen
8c59210062
docs: Fix typo
2024-01-27 19:37:59 -05:00
Andrei Betlen
399fa1e03b
docs: Add JSON and JSON schema mode examples to README
2024-01-27 19:36:33 -05:00
Andrei Betlen
d6fb16e055
docs: Update README
2024-01-25 10:51:48 -05:00
Andrei Betlen
5b258bf840
docs: Update README with more param common examples
2024-01-24 10:51:15 -05:00
Andrei Betlen
88fbccaaa3
docs: Add macosx wrong arch fix to README
2024-01-21 18:38:44 -05:00
Jerry Liu
84380fe9a6
Add llamaindex integration to readme ( #1092 )
2024-01-16 19:10:50 -05:00
Caleb Hoff
f766b70c9a
Fix: Correct typo in README.md ( #1058 )
...
In Llama.create_chat_completion, the `tool_choice` property does not have an s on the end.
2024-01-04 18:12:32 -05:00
Andrei Betlen
f4be84c122
Fix typo
2023-12-22 14:40:44 -05:00