baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	34081ddc5b	chore: Bump version	2024-04-03 15:38:27 -04:00
Andrei Betlen	368061c04a	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-04-03 15:35:30 -04:00
Andrei Betlen	5a5193636b	feat: Update llama.cpp	2024-04-03 15:35:28 -04:00
Andrei	5a930ee9a1	feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247 ) * Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name	2024-04-03 15:32:13 -04:00
Andrei Betlen	8649d7671b	fix: segfault when logits_all=False. Closes #1319	2024-04-03 15:30:31 -04:00
Andrei Betlen	f96de6d920	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-04-03 00:55:21 -04:00
Andrei Betlen	e465157804	feat: Update llama.cpp	2024-04-03 00:55:19 -04:00
Yuri Mikhailov	62aad610e1	fix: last tokens passing to sample_repetition_penalties function (#1295 ) Co-authored-by: ymikhaylov <ymikhaylov@x5.ru> Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-01 15:25:43 -04:00
Andrei Betlen	45bf5ae582	chore: Bump version	2024-04-01 10:28:22 -04:00
lawfordp2017	a0f373e310	fix: Changed local API doc references to hosted (#1317 )	2024-04-01 10:21:00 -04:00
Limour	f165048a69	feat: add support for KV cache quantization options (#1307 ) * add KV cache quantization options https://github.com/abetlen/llama-cpp-python/discussions/1220 https://github.com/abetlen/llama-cpp-python/issues/1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-04-01 10:19:28 -04:00
windspirit95	aa9f1ae011	feat: Add logprobs support to chat completions (#1311 ) * Add logprobs return in ChatCompletionResponse * Fix duplicate field * Set default to false * Simplify check * Add server example --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-03-31 13:30:13 -04:00
Andrei Betlen	1e60dba082	feat: Update llama.cpp	2024-03-29 13:34:23 -04:00
Andrei Betlen	dcbe57fcf8	feat: Update llama.cpp	2024-03-29 12:45:27 -04:00
Andrei Betlen	125b2358c9	feat: Update llama.cpp	2024-03-28 12:06:46 -04:00
Andrei Betlen	901fe02461	feat: Update llama.cpp	2024-03-26 22:58:53 -04:00
Andrei Betlen	b64fa4e2c0	feat: Update llama.cpp	2024-03-25 23:09:07 -04:00
Andrei Betlen	a93b9149f8	feat: Update llama.cpp	2024-03-25 11:10:14 -04:00
Andrei Betlen	364678bde5	feat: Update llama.cpp	2024-03-24 12:27:49 -04:00
Andrei Betlen	d11ccc3036	fix(server): minor type fixes	2024-03-23 17:14:15 -04:00
Andrei Betlen	c1325dcdfb	fix: tool_call missing first token.	2024-03-22 23:44:04 -04:00
Andrei Betlen	e325a831f0	feat: Update llama.cpp	2024-03-22 23:43:29 -04:00
Andrei Betlen	c89be28ef9	feat: Update llama.cpp	2024-03-20 20:50:47 -04:00
Andrei Betlen	3db03b7302	feat: Update llama.cpp	2024-03-20 13:27:43 -04:00
bretello	740f3f3812	fix: set LLAMA_METAL_EMBED_LIBRARY=on on MacOS arm64 (#1289 )	2024-03-20 12:46:09 -04:00
Andrei Betlen	f7decc9562	docs: Add chat examples to openapi ui	2024-03-19 10:52:53 -04:00
Andrei	60d8498f21	feat: Add tools/functions variables to Jinja2ChatFormatter, add function response formatting for all simple chat formats (#1273 ) * Add tools/functions variables to Jinja2ChatFormatter Also fixed missing tools/tool_choices parameters in chat_formatter_to_chat_completion_handler(). * Set grammar when doing explicit function calling * Add function / tool response for all chat formats --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2024-03-19 04:55:57 -04:00
Andrei Betlen	18d7ce918f	feat: Update llama.cpp	2024-03-19 04:40:24 -04:00
Andrei Betlen	7d4a5ec59f	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-03-18 11:37:33 -04:00
Andrei Betlen	bf64752535	chore: Bump version	2024-03-18 11:37:30 -04:00
Jeffrey Fong	8a60c7bc8c	fix: Fix and optimize functionary chat handler (#1282 ) * fix functionary chat logic * further fixes --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-03-18 10:40:57 -04:00
Andrei Betlen	8d298b4750	feat: Update llama.cpp	2024-03-18 10:26:36 -04:00
Andrei Betlen	6eb25231e4	feat: Update llama.cpp	2024-03-15 12:58:45 -04:00
Andrei Betlen	20e6815252	fix: json mode	2024-03-15 12:58:34 -04:00
Andrei Betlen	1a9b8af2dd	feat: Update llama.cpp	2024-03-14 11:46:48 -04:00
Andrei Betlen	4084aabe86	fix: set default pooling type to unspecified	2024-03-14 10:04:57 -04:00
Andrei Betlen	d318cc8b83	fix: Set default pooling_type to mean, check for null pointer.	2024-03-14 09:17:41 -04:00
Andrei Betlen	dd0ee56217	feat: Update llama.cpp	2024-03-13 15:57:35 -04:00
Andrei Betlen	08e910f7a7	feat: Update llama.cpp	2024-03-10 23:45:05 -04:00
Andrei Betlen	a7281994d8	chore: Bump version	2024-03-08 21:14:44 -05:00
Andrei Betlen	919fca9f2b	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-03-08 21:10:56 -05:00
Andrei Betlen	d02a9cf16f	Fixed json strings grammar by blacklisting character control set. Closes #1259	2024-03-08 21:10:53 -05:00
Felipe Lorenz	c139f8b5d5	feat: Add endpoints for tokenize, detokenize and count tokens (#1136 ) * Add endpoint to count tokens * Add tokenize and detokenize endpoints * Change response key to tokens for tokenize endpoint * Fix dependency bug * Cleanup * Remove example added by mistake * Move tokenize, detokenize, and count to Extras namespace. Tag existing endpoints --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-03-08 21:09:00 -05:00
Kevin Cao	1f3156d4f2	fix: Check for existence of clip model path (#1264 )	2024-03-08 21:00:10 -05:00
Douglas Hanley	2811014bae	feat: Switch embed to llama_get_embeddings_seq (#1263 ) * switch to llama_get_embeddings_seq * Remove duplicate definition of llama_get_embeddings_seq Co-authored-by: Andrei <abetlen@gmail.com> --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-03-08 20:59:35 -05:00
Andrei Betlen	40c6b54f68	feat: Update llama.cpp	2024-03-08 20:58:50 -05:00
Andrei Betlen	93dc56ace8	Update llama.cpp	2024-03-06 01:32:00 -05:00
Andrei Betlen	87a6e5797e	feat: Update llama.cpp	2024-03-03 11:27:04 -05:00
Andrei Betlen	13177aae0f	chore: Bump version	2024-03-02 22:46:40 -05:00
Kenneth Hoste	663659f730	docs: fix small typo in README: 'model know how' -> 'model knows how' (#1244 ) Co-authored-by: Andrei <abetlen@gmail.com>	2024-03-02 22:20:41 -05:00

1 2 3 4 5 ...

1713 commits