baalajimaestro/llama.cpp

Author	SHA1	Message	Date
nullname	d634efcdd9	feat: adding `rpc_servers` parameter to `Llama` class (#1477 ) * passthru rpc_servers params wip * enable llama rpc by default * convert string to byte * add rpc package * Revert "enable llama rpc by default" This reverts commit 832c6dd56c979514cec5df224bf2d2014dccd790. * update readme * Only set rpc_servers when provided * Add rpc servers to server options --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-06-04 10:38:21 -04:00
Andrei Betlen	2d89964147	docs: Fix table formatting	2024-05-24 11:55:41 -04:00
Andrei Betlen	9e8d7d55bd	fix(docs): Fix link typo	2024-05-24 11:55:01 -04:00
Andrei Betlen	ec43e8920f	docs: Update multi-modal model section	2024-05-24 11:54:15 -04:00
Peng Yu	1547202b77	docs: Fix typo in README.md (#1444 )	2024-05-10 10:35:51 -04:00
Ikko Eltociear Ashimine	07966b9ba7	docs: update README.md (#1432 ) accomodate -> accommodate	2024-05-08 02:20:20 -04:00
Andrei Betlen	945c62c567	docs: Change all examples from interpreter style to script style.	2024-04-30 10:15:04 -04:00
Andrei Betlen	26478ab293	docs: Update README.md	2024-04-30 10:11:38 -04:00
Andrei Betlen	c8cd8c17c6	docs: Update README to include CUDA 12.4 wheels	2024-04-30 03:12:46 -04:00
Andrei	fe2da09538	feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147 ) * Test dummy image tags in chat templates * Format and improve types for llava_cpp.py * Add from_pretrained support to llava chat format. * Refactor llava chat format to use a jinja2 * Revert chat format test * Add moondream support (wip) * Update moondream chat format * Update moondream chat format * Update moondream prompt * Add function calling support * Cache last image embed * Add Llava1.6 support * Add nanollava support * Add obisidian support * Remove unnecessary import * Re-order multimodal chat formats * Logits all no longer required for multi-modal models * Update README.md * Update docs * Update README * Fix typo * Update README * Fix typo	2024-04-30 01:35:38 -04:00
Jeffrey Fong	f178636e1b	fix: Functionary bug fixes (#1385 ) * fix completion tokens tracking, prompt forming * fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8 * Updated README * fix for openai server compatibility --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-27 20:49:52 -04:00
Douglas Hanley	f6ed21f9a2	feat: Allow for possibly non-pooled embeddings (#1380 ) * allow for possibly non-pooled embeddings * add more to embeddings section in README.md --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-25 21:32:44 -04:00
Sigbjørn Skjæret	7265a5dc0e	fix(docs): incorrect tool_choice example (#1330 )	2024-04-05 09:14:03 -04:00
Andrei Betlen	909ef66951	docs: Rename cuBLAS section to CUDA	2024-04-04 03:08:47 -04:00
Andrei Betlen	1db3b58fdc	docs: Add docs explaining how to install pre-built wheels.	2024-04-04 02:57:06 -04:00
Andrei Betlen	c50309e52a	docs: LLAMA_CUBLAS -> LLAMA_CUDA	2024-04-04 02:49:19 -04:00
Andrei	5a930ee9a1	feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247 ) * Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name	2024-04-03 15:32:13 -04:00
lawfordp2017	a0f373e310	fix: Changed local API doc references to hosted (#1317 )	2024-04-01 10:21:00 -04:00
Kenneth Hoste	663659f730	docs: fix small typo in README: 'model know how' -> 'model knows how' (#1244 ) Co-authored-by: Andrei <abetlen@gmail.com>	2024-03-02 22:20:41 -05:00
Andrei Betlen	97aa3a153d	docs: Add information re: auto chat formats. Closes #1236	2024-03-01 13:10:25 -05:00
Douglas Hanley	cf1fdd8a9a	docs: fix typo in README.md embeddings example. (#1232 )	2024-02-29 13:55:50 -05:00
Andrei	4d574bd765	feat(server): Add support for pulling models from Huggingface Hub (#1222 ) * Basic support for hf pull on server * Add hf_model_repo_id setting * Update README	2024-02-26 14:35:08 -05:00
Andrei Betlen	b3e358dee4	docs: Add example of local image loading to README	2024-02-26 11:58:33 -05:00
Andrei Betlen	b681674bf2	docs: Fix functionary repo_id	2024-02-23 12:36:13 -05:00
Andrei Betlen	702306b381	docs: Restore functionary docs in README	2024-02-23 12:34:02 -05:00
Aditya Purandare	52d9d70076	docs: Update README.md to fix pip install llama cpp server (#1187 ) Without the single quotes, when running the command, an error is printed saying no matching packages found on pypi. Adding the quotes fixes it ```bash $ pip install llama-cpp-python[server] zsh: no matches found: llama-cpp-python[server] ``` Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-23 04:41:22 -05:00
Andrei Betlen	410e02da51	docs: Fix typo	2024-02-23 00:43:31 -05:00
Andrei Betlen	eb56ce2e2a	docs: fix low-level api example	2024-02-22 11:33:05 -05:00
Andrei Betlen	0f8cad6cb7	docs: Update README	2024-02-22 11:31:44 -05:00
Andrei Betlen	045cc12670	docs: Update README	2024-02-22 03:53:52 -05:00
Andrei Betlen	32efed7b07	docs: Update README	2024-02-22 03:25:11 -05:00
Andrei Betlen	d80c5cf29d	docs: fix indentation for mkdocs-material	2024-02-22 02:30:24 -05:00
Andrei	0f8aa4ab5c	feat: Pull models directly from huggingface (#1206 ) * Add from_pretrained method to Llama class * Update docs * Merge filename and pattern	2024-02-21 16:25:10 -05:00
Andrei Betlen	c2a234a086	docs: Add embeddings section	2024-02-15 23:15:50 -05:00
Andrei Betlen	4348a6cdf0	docs: Fix typo	2024-02-13 02:04:54 -05:00
Andrei Betlen	b82b0e1014	docs: Temporarily revert function calling docs	2024-02-12 16:27:43 -05:00
Akarshan Biswas	918ff27e50	docs: Set the correct command for compiling with syscl support (#1172 )	2024-02-11 13:55:15 -05:00
Jeffrey Fong	901827013b	feat: Integrate functionary v1.4 and v2 models + add custom tokenizer support to Llama class (#1078 ) * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * set up parallel function calling * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * convert functionary-v1 chat handler to use hf autotokenizer * add hf_tokenizer + inteegrate functionary-v1.4 prompt template * integrate functionary v2 prompt template * update readme * set up parallel function calling wip * resolve merge conflict * Update README.md * Update README.md * refactor tokenizers * include old functionary handler for backward compatibility * add hf_tokenizer_path in server ModelSettings * Cleanup PR, fix breaking changes * Use hf_pretrained_model_name_or_path for tokenizer * fix hf tokenizer in streaming * update README * refactor offset mapping --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-02-07 20:07:03 -05:00
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	247a16de66	docs: Update README	2024-01-30 12:23:07 -05:00
Andrei Betlen	059f6b3ac8	docs: fix typos	2024-01-29 11:02:25 -05:00
Andrei Betlen	843e77e3e2	docs: Add Vulkan build instructions	2024-01-29 11:01:26 -05:00
Andrei Betlen	8c59210062	docs: Fix typo	2024-01-27 19:37:59 -05:00
Andrei Betlen	399fa1e03b	docs: Add JSON and JSON schema mode examples to README	2024-01-27 19:36:33 -05:00
Andrei Betlen	d6fb16e055	docs: Update README	2024-01-25 10:51:48 -05:00
Andrei Betlen	5b258bf840	docs: Update README with more param common examples	2024-01-24 10:51:15 -05:00
Andrei Betlen	88fbccaaa3	docs: Add macosx wrong arch fix to README	2024-01-21 18:38:44 -05:00
Jerry Liu	84380fe9a6	Add llamaindex integration to readme (#1092 )	2024-01-16 19:10:50 -05:00
Caleb Hoff	f766b70c9a	Fix: Correct typo in README.md (#1058 ) In Llama.create_chat_completion, the `tool_choice` property does not have an s on the end.	2024-01-04 18:12:32 -05:00
Andrei Betlen	f4be84c122	Fix typo	2023-12-22 14:40:44 -05:00

1 2 3

143 commits