baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei	fb762a6041	Add speculative decoding (#1120 ) * Add draft model param to llama class, implement basic prompt lookup decoding draft model * Use samplingcontext for sampling * Use 1d array * Use draft model for sampling * Fix dumb mistake * Allow for later extensions to the LlamaDraftModel api * Cleanup * Adaptive candidate prediction * Update implementation to match hf transformers * Tuning * Fix bug where last token was not used for ngram prediction * Remove heuristic for num_pred_tokens (no benefit) * fix: n_candidates bug. * Add draft_model_num_pred_tokens server setting * Cleanup * Update README	2024-01-31 14:08:14 -05:00
Andrei Betlen	247a16de66	docs: Update README	2024-01-30 12:23:07 -05:00
Andrei Betlen	059f6b3ac8	docs: fix typos	2024-01-29 11:02:25 -05:00
Andrei Betlen	843e77e3e2	docs: Add Vulkan build instructions	2024-01-29 11:01:26 -05:00
Andrei Betlen	8c59210062	docs: Fix typo	2024-01-27 19:37:59 -05:00
Andrei Betlen	399fa1e03b	docs: Add JSON and JSON schema mode examples to README	2024-01-27 19:36:33 -05:00
Andrei Betlen	d6fb16e055	docs: Update README	2024-01-25 10:51:48 -05:00
Andrei Betlen	5b258bf840	docs: Update README with more param common examples	2024-01-24 10:51:15 -05:00
Andrei Betlen	88fbccaaa3	docs: Add macosx wrong arch fix to README	2024-01-21 18:38:44 -05:00
Jerry Liu	84380fe9a6	Add llamaindex integration to readme (#1092 )	2024-01-16 19:10:50 -05:00
Caleb Hoff	f766b70c9a	Fix: Correct typo in README.md (#1058 ) In Llama.create_chat_completion, the `tool_choice` property does not have an s on the end.	2024-01-04 18:12:32 -05:00
Andrei Betlen	f4be84c122	Fix typo	2023-12-22 14:40:44 -05:00
Andrei Betlen	9b3a5939f3	docs: Add multi-model link to readme	2023-12-22 14:40:13 -05:00
evelynmitchell	37da8e863a	Update README.md functionary demo typo (#996 ) missing comma	2023-12-16 19:00:30 -05:00
zocainViken	6bbeea07ae	README.md multimodal params fix (#967 ) multi modal params fix: add logits = True -> to make llava work	2023-12-11 20:41:38 -05:00
Aniket Maurya	c1d92ce680	fix minor typo (#958 ) * fix minor typo * Fix typo --------- Co-authored-by: Andrei <abetlen@gmail.com>	2023-12-11 20:40:38 -05:00
Andrei Betlen	fb32f9d438	docs: Update README	2023-11-28 03:15:01 -05:00
Andrei Betlen	43e006a291	docs: Remove divider	2023-11-28 02:41:50 -05:00
Andrei Betlen	2cc6c9ae2f	docs: Update README, add FAQ	2023-11-28 02:37:34 -05:00
Andrei Betlen	9c68b1804a	docs: Add api reference links in README	2023-11-27 18:54:07 -05:00
Andrei Betlen	41428244f0	docs: Fix README indentation	2023-11-27 18:29:13 -05:00
Andrei Betlen	1539146a5e	docs: Fix README docs link	2023-11-27 18:21:00 -05:00
Anton Vice	aa5a7a1880	Update README.md (#940 ) .ccp >> .cpp	2023-11-26 15:39:38 -05:00
Andrei Betlen	abb1976ad7	docs: Add n_ctx not for multimodal models	2023-11-22 21:07:00 -05:00
Andrei Betlen	36679a58ef	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-11-22 19:49:59 -05:00
Andrei Betlen	bd43fb2bfe	docs: Update high-level python api examples in README to include chat formats, function calling, and multi-modal models.	2023-11-22 19:49:56 -05:00
Andrei Betlen	d977b44d82	docs: Add links to server functionality	2023-11-22 18:21:02 -05:00
Andrei Betlen	aa815d580c	docs: Link to langchain docs	2023-11-22 18:17:49 -05:00
Andrei Betlen	602ea64ddd	docs: Fix whitespace	2023-11-22 18:09:31 -05:00
Andrei Betlen	f336eebb2f	docs: fix 404 to macos installation guide. Closes #861	2023-11-22 18:07:30 -05:00
Andrei Betlen	1ff2c92720	docs: minor indentation fix	2023-11-22 18:04:18 -05:00
Andrei Betlen	68238b7883	docs: setting n_gqa is no longer required	2023-11-22 18:01:54 -05:00
Andrei Betlen	198178225c	docs: Remove stale warning	2023-11-22 17:59:16 -05:00
Juraj Bednar	5a9770a56b	Improve documentation for server chat formats (#934 )	2023-11-22 06:10:03 -05:00
James Braza	23a221999f	Documenting server usage (#768 )	2023-11-21 00:24:22 -05:00
Sujeendran Menon	7b136bb5b1	Fix for shared library not found and compile issues in Windows (#848 ) * fix windows library dll name issue * Updated README.md Windows instructions * Update llama_cpp.py to handle different windows dll file versions	2023-11-01 18:55:57 -04:00
Jason Cox	40b22909dc	Update examples from ggml to gguf and add hw-accel note for Web Server (#688 ) * Examples from ggml to gguf * Use gguf file extension Update examples to use filenames with gguf extension (e.g. llama-model.gguf). --------- Co-authored-by: Andrei <abetlen@gmail.com>	2023-09-14 14:48:21 -04:00
Andrei Betlen	f4090a0bb2	Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs.	2023-09-13 23:00:43 -04:00
Andrei Betlen	8ddf63b9c7	Remove reference to FORCE_CMAKE from docs	2023-09-12 23:56:10 -04:00
Andrei Betlen	bcef9ab2d9	Update title	2023-09-12 19:02:30 -04:00
Andrei Betlen	89ae347585	Remove references to force_cmake	2023-09-12 19:02:20 -04:00
Andrei Betlen	1dd3f473c0	Remove references to FORCE_CMAKE	2023-09-12 19:01:16 -04:00
Andrei Betlen	1910793f56	Merge branch 'main' into v0.2-wip	2023-09-12 16:43:32 -04:00
Juarez Bochi	20ac434d0f	Fix low level api examples	2023-09-07 17:50:47 -04:00
Andrei Betlen	895f84f8fa	Add ROCm / AMD instructions to docs	2023-08-25 17:19:23 -04:00
Andrei Betlen	ac47d55577	Merge branch 'main' into v0.2-wip	2023-08-25 15:45:22 -04:00
Andrei	915bbeacc5	Merge pull request #633 from abetlen/gguf GGUF (Breaking Change to Model Files)	2023-08-25 15:13:12 -04:00
Andrei Betlen	ac37ea562b	Add temporary docs for GGUF model conversion	2023-08-25 15:11:08 -04:00
Andrei Betlen	80389f71da	Update README	2023-08-25 05:02:48 -04:00
Andrei Betlen	cf405f6764	Merge branch 'main' into v0.2-wip	2023-08-24 00:30:51 -04:00

1 2 3

105 commits