Commit graph

105 commits

Author SHA1 Message Date
Andrei
fb762a6041
Add speculative decoding (#1120)
* Add draft model param to llama class, implement basic prompt lookup decoding draft model

* Use samplingcontext for sampling

* Use 1d array

* Use draft model for sampling

* Fix dumb mistake

* Allow for later extensions to the LlamaDraftModel api

* Cleanup

* Adaptive candidate prediction

* Update implementation to match hf transformers

* Tuning

* Fix bug where last token was not used for ngram prediction

* Remove heuristic for num_pred_tokens (no benefit)

* fix: n_candidates bug.

* Add draft_model_num_pred_tokens server setting

* Cleanup

* Update README
2024-01-31 14:08:14 -05:00
Andrei Betlen
247a16de66 docs: Update README 2024-01-30 12:23:07 -05:00
Andrei Betlen
059f6b3ac8 docs: fix typos 2024-01-29 11:02:25 -05:00
Andrei Betlen
843e77e3e2 docs: Add Vulkan build instructions 2024-01-29 11:01:26 -05:00
Andrei Betlen
8c59210062 docs: Fix typo 2024-01-27 19:37:59 -05:00
Andrei Betlen
399fa1e03b docs: Add JSON and JSON schema mode examples to README 2024-01-27 19:36:33 -05:00
Andrei Betlen
d6fb16e055 docs: Update README 2024-01-25 10:51:48 -05:00
Andrei Betlen
5b258bf840 docs: Update README with more param common examples 2024-01-24 10:51:15 -05:00
Andrei Betlen
88fbccaaa3 docs: Add macosx wrong arch fix to README 2024-01-21 18:38:44 -05:00
Jerry Liu
84380fe9a6
Add llamaindex integration to readme (#1092) 2024-01-16 19:10:50 -05:00
Caleb Hoff
f766b70c9a
Fix: Correct typo in README.md (#1058)
In Llama.create_chat_completion, the `tool_choice` property does not have an s on the end.
2024-01-04 18:12:32 -05:00
Andrei Betlen
f4be84c122 Fix typo 2023-12-22 14:40:44 -05:00
Andrei Betlen
9b3a5939f3 docs: Add multi-model link to readme 2023-12-22 14:40:13 -05:00
evelynmitchell
37da8e863a
Update README.md functionary demo typo (#996)
missing comma
2023-12-16 19:00:30 -05:00
zocainViken
6bbeea07ae
README.md multimodal params fix (#967)
multi modal params fix: add logits = True -> to make llava work
2023-12-11 20:41:38 -05:00
Aniket Maurya
c1d92ce680
fix minor typo (#958)
* fix minor typo

* Fix typo

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2023-12-11 20:40:38 -05:00
Andrei Betlen
fb32f9d438 docs: Update README 2023-11-28 03:15:01 -05:00
Andrei Betlen
43e006a291 docs: Remove divider 2023-11-28 02:41:50 -05:00
Andrei Betlen
2cc6c9ae2f docs: Update README, add FAQ 2023-11-28 02:37:34 -05:00
Andrei Betlen
9c68b1804a docs: Add api reference links in README 2023-11-27 18:54:07 -05:00
Andrei Betlen
41428244f0 docs: Fix README indentation 2023-11-27 18:29:13 -05:00
Andrei Betlen
1539146a5e docs: Fix README docs link 2023-11-27 18:21:00 -05:00
Anton Vice
aa5a7a1880
Update README.md (#940)
.ccp >> .cpp
2023-11-26 15:39:38 -05:00
Andrei Betlen
abb1976ad7 docs: Add n_ctx not for multimodal models 2023-11-22 21:07:00 -05:00
Andrei Betlen
36679a58ef Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-11-22 19:49:59 -05:00
Andrei Betlen
bd43fb2bfe docs: Update high-level python api examples in README to include chat formats, function calling, and multi-modal models. 2023-11-22 19:49:56 -05:00
Andrei Betlen
d977b44d82 docs: Add links to server functionality 2023-11-22 18:21:02 -05:00
Andrei Betlen
aa815d580c docs: Link to langchain docs 2023-11-22 18:17:49 -05:00
Andrei Betlen
602ea64ddd docs: Fix whitespace 2023-11-22 18:09:31 -05:00
Andrei Betlen
f336eebb2f docs: fix 404 to macos installation guide. Closes #861 2023-11-22 18:07:30 -05:00
Andrei Betlen
1ff2c92720 docs: minor indentation fix 2023-11-22 18:04:18 -05:00
Andrei Betlen
68238b7883 docs: setting n_gqa is no longer required 2023-11-22 18:01:54 -05:00
Andrei Betlen
198178225c docs: Remove stale warning 2023-11-22 17:59:16 -05:00
Juraj Bednar
5a9770a56b
Improve documentation for server chat formats (#934) 2023-11-22 06:10:03 -05:00
James Braza
23a221999f
Documenting server usage (#768) 2023-11-21 00:24:22 -05:00
Sujeendran Menon
7b136bb5b1
Fix for shared library not found and compile issues in Windows (#848)
* fix windows library dll name issue

* Updated README.md Windows instructions

* Update llama_cpp.py to handle different windows dll file versions
2023-11-01 18:55:57 -04:00
Jason Cox
40b22909dc
Update examples from ggml to gguf and add hw-accel note for Web Server (#688)
* Examples from ggml to gguf

* Use gguf file extension

Update examples to use filenames with gguf extension (e.g. llama-model.gguf).

---------

Co-authored-by: Andrei <abetlen@gmail.com>
2023-09-14 14:48:21 -04:00
Andrei Betlen
f4090a0bb2 Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs. 2023-09-13 23:00:43 -04:00
Andrei Betlen
8ddf63b9c7 Remove reference to FORCE_CMAKE from docs 2023-09-12 23:56:10 -04:00
Andrei Betlen
bcef9ab2d9 Update title 2023-09-12 19:02:30 -04:00
Andrei Betlen
89ae347585 Remove references to force_cmake 2023-09-12 19:02:20 -04:00
Andrei Betlen
1dd3f473c0 Remove references to FORCE_CMAKE 2023-09-12 19:01:16 -04:00
Andrei Betlen
1910793f56 Merge branch 'main' into v0.2-wip 2023-09-12 16:43:32 -04:00
Juarez Bochi
20ac434d0f
Fix low level api examples 2023-09-07 17:50:47 -04:00
Andrei Betlen
895f84f8fa Add ROCm / AMD instructions to docs 2023-08-25 17:19:23 -04:00
Andrei Betlen
ac47d55577 Merge branch 'main' into v0.2-wip 2023-08-25 15:45:22 -04:00
Andrei
915bbeacc5
Merge pull request #633 from abetlen/gguf
GGUF (Breaking Change to Model Files)
2023-08-25 15:13:12 -04:00
Andrei Betlen
ac37ea562b Add temporary docs for GGUF model conversion 2023-08-25 15:11:08 -04:00
Andrei Betlen
80389f71da Update README 2023-08-25 05:02:48 -04:00
Andrei Betlen
cf405f6764 Merge branch 'main' into v0.2-wip 2023-08-24 00:30:51 -04:00