baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Daniel Thuerck	2138561fab	fix(server): Propagate `flash_attn` to model load. (#1424 )	2024-05-03 12:17:07 -04:00
Andrei Betlen	2117122396	chore: Bump version	2024-05-02 12:07:09 -04:00
Andrei Betlen	31b1d95a6c	feat: Add llama-3-vision-alpha chat format	2024-05-02 11:32:18 -04:00
Andrei Betlen	4f01c452b6	fix: Change default verbose value of verbose in image chat format handlers to True to match Llama	2024-04-30 15:50:30 -04:00
Andrei Betlen	9286b5caac	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2024-04-30 15:45:36 -04:00
Andrei Betlen	f116175a5a	fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729	2024-04-30 15:45:34 -04:00
Jonathan Soma	3226b3c5ef	fix: UTF-8 handling with grammars (#1415 ) Use Python's built-in UTF-8 handling to get code points	2024-04-30 14:33:23 -04:00
Andrei Betlen	b14dd98922	chore: Bump version	2024-04-30 09:39:56 -04:00
Andrei Betlen	29b6e9a5c8	fix: wrong parameter for flash attention in pickle __getstate__	2024-04-30 09:32:47 -04:00
Andrei Betlen	22d77eefd2	feat: Add option to enable `flash_attn` to Lllama params and ModelSettings	2024-04-30 09:29:16 -04:00
Andrei Betlen	8c2b24d5aa	feat: Update llama.cpp	2024-04-30 09:27:55 -04:00
Andrei Betlen	f417cce28a	chore: Bump version	2024-04-30 03:11:02 -04:00
Andrei Betlen	3489ef09d3	fix: Ensure image renders before text in chat formats regardless of message content order.	2024-04-30 03:08:46 -04:00
Andrei Betlen	26c7876ba0	chore: Bump version	2024-04-30 01:48:40 -04:00
Andrei	fe2da09538	feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147 ) * Test dummy image tags in chat templates * Format and improve types for llava_cpp.py * Add from_pretrained support to llava chat format. * Refactor llava chat format to use a jinja2 * Revert chat format test * Add moondream support (wip) * Update moondream chat format * Update moondream chat format * Update moondream prompt * Add function calling support * Cache last image embed * Add Llava1.6 support * Add nanollava support * Add obisidian support * Remove unnecessary import * Re-order multimodal chat formats * Logits all no longer required for multi-modal models * Update README.md * Update docs * Update README * Fix typo * Update README * Fix typo	2024-04-30 01:35:38 -04:00
Andrei Betlen	97fb860eba	feat: Update llama.cpp	2024-04-29 23:34:55 -04:00
Andrei Betlen	a411612b38	feat: Add support for str type kv_overrides	2024-04-27 23:42:19 -04:00
Andrei Betlen	c9b85bf098	feat: Update llama.cpp	2024-04-27 23:41:54 -04:00
Jeffrey Fong	f178636e1b	fix: Functionary bug fixes (#1385 ) * fix completion tokens tracking, prompt forming * fix 'function_call' and 'tool_calls' depending on 'functions' and 'tools', incompatibility with python 3.8 * Updated README * fix for openai server compatibility --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-27 20:49:52 -04:00
Andrei Betlen	65edc90671	chore: Bump version	2024-04-26 10:11:31 -04:00
Andrei Betlen	173ebc7878	fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings	2024-04-25 21:36:09 -04:00
Douglas Hanley	f6ed21f9a2	feat: Allow for possibly non-pooled embeddings (#1380 ) * allow for possibly non-pooled embeddings * add more to embeddings section in README.md --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-25 21:32:44 -04:00
Andrei Betlen	fcfea66857	fix: pydantic deprecation warning	2024-04-25 21:21:48 -04:00
Andrei Betlen	7f52335c50	feat: Update llama.cpp	2024-04-25 21:21:29 -04:00
Andrei Betlen	2a9979fce1	feat: Update llama.cpp	2024-04-25 02:48:26 -04:00
Andrei Betlen	c50d3300d2	chore: Bump version	2024-04-23 02:53:20 -04:00
Sean Bailey	53ebcc8bb5	feat(server): Provide ability to dynamically allocate all threads if desired using `-1` (#1364 )	2024-04-23 02:35:38 -04:00
abk16	8559e8ce88	feat: Add Llama-3 chat format (#1371 ) * feat: Add Llama-3 chat format * feat: Auto-detect Llama-3 chat format from gguf template * feat: Update llama.cpp to b2715 Includes proper Llama-3 <\|eot_id\|> token handling. --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-04-23 02:33:29 -04:00
Andrei Betlen	d40a250ef3	feat: Use new llama_token_is_eog in create_completions	2024-04-22 00:35:47 -04:00
Andrei Betlen	b21ba0e2ac	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-04-21 20:46:42 -04:00
Andrei Betlen	159cc4e5d9	feat: Update llama.cpp	2024-04-21 20:46:40 -04:00
Andrei Betlen	0281214863	chore: Bump version	2024-04-20 00:09:37 -04:00
Andrei Betlen	cc81afebf0	feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct	2024-04-20 00:00:53 -04:00
Andrei Betlen	893a27a736	chore: Bump version	2024-04-18 01:43:39 -04:00
Lucca Zenóbio	4f42664955	feat: update grammar schema converter to match llama.cpp (#1353 ) * feat: improve function calling * feat:grammar * fix * fix * fix	2024-04-18 01:36:25 -04:00
Andrei Betlen	fa4bb0cf81	Revert "feat: Update json to grammar (#1350 )" This reverts commit `610a592f70`.	2024-04-17 16:18:16 -04:00
Lucca Zenóbio	610a592f70	feat: Update json to grammar (#1350 ) * feat: improve function calling * feat:grammar	2024-04-17 10:10:21 -04:00
khimaros	b73c73c0c6	feat: add `disable_ping_events` flag (#1257 ) for backward compatibility, this is false by default it can be set to true to disable EventSource pings which are not supported by some OpenAI clients. fixes https://github.com/abetlen/llama-cpp-python/issues/1256	2024-04-17 10:08:19 -04:00
tc-wolf	4924455dec	feat: Make saved state more compact on-disk (#1296 ) * State load/save changes - Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)` sized array. - Difference between ~350MB and ~1500MB for example prompt with ~300 tokens (makes sense lol) - Auto-formatting changes * Back out formatting changes	2024-04-17 10:06:50 -04:00
ddh0	c96b2daebf	feat: Use all available CPUs for batch processing (#1345 )	2024-04-17 10:05:54 -04:00
Andrei Betlen	ef29235d45	chore: Bump version	2024-04-10 03:44:46 -04:00
Andrei Betlen	bb65b4d764	fix: pass correct type to chat handlers for chat completion logprobs	2024-04-10 03:41:55 -04:00
Andrei Betlen	060bfa64d5	feat: Add support for yaml based configs	2024-04-10 02:47:01 -04:00
Andrei Betlen	1347e1d050	feat: Add typechecking for ctypes structure attributes	2024-04-10 02:40:41 -04:00
Andrei Betlen	889d0e8981	feat: Update llama.cpp	2024-04-10 02:25:58 -04:00
Andrei Betlen	56071c956a	feat: Update llama.cpp	2024-04-09 09:53:49 -04:00
Andrei Betlen	08b16afe11	chore: Bump version	2024-04-06 01:53:38 -04:00
Andrei Betlen	1ae3abbcc3	fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314	2024-04-05 10:51:44 -04:00
Andrei Betlen	34081ddc5b	chore: Bump version	2024-04-03 15:38:27 -04:00
Andrei Betlen	8649d7671b	fix: segfault when logits_all=False. Closes #1319	2024-04-03 15:30:31 -04:00
Yuri Mikhailov	62aad610e1	fix: last tokens passing to sample_repetition_penalties function (#1295 ) Co-authored-by: ymikhaylov <ymikhaylov@x5.ru> Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-01 15:25:43 -04:00
Andrei Betlen	45bf5ae582	chore: Bump version	2024-04-01 10:28:22 -04:00
Limour	f165048a69	feat: add support for KV cache quantization options (#1307 ) * add KV cache quantization options https://github.com/abetlen/llama-cpp-python/discussions/1220 https://github.com/abetlen/llama-cpp-python/issues/1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-04-01 10:19:28 -04:00
windspirit95	aa9f1ae011	feat: Add logprobs support to chat completions (#1311 ) * Add logprobs return in ChatCompletionResponse * Fix duplicate field * Set default to false * Simplify check * Add server example --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-03-31 13:30:13 -04:00
Andrei Betlen	125b2358c9	feat: Update llama.cpp	2024-03-28 12:06:46 -04:00
Andrei Betlen	901fe02461	feat: Update llama.cpp	2024-03-26 22:58:53 -04:00
Andrei Betlen	d11ccc3036	fix(server): minor type fixes	2024-03-23 17:14:15 -04:00
Andrei Betlen	c1325dcdfb	fix: tool_call missing first token.	2024-03-22 23:44:04 -04:00
Andrei Betlen	e325a831f0	feat: Update llama.cpp	2024-03-22 23:43:29 -04:00
Andrei Betlen	f7decc9562	docs: Add chat examples to openapi ui	2024-03-19 10:52:53 -04:00
Andrei	60d8498f21	feat: Add tools/functions variables to Jinja2ChatFormatter, add function response formatting for all simple chat formats (#1273 ) * Add tools/functions variables to Jinja2ChatFormatter Also fixed missing tools/tool_choices parameters in chat_formatter_to_chat_completion_handler(). * Set grammar when doing explicit function calling * Add function / tool response for all chat formats --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2024-03-19 04:55:57 -04:00
Andrei Betlen	7d4a5ec59f	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-03-18 11:37:33 -04:00
Andrei Betlen	bf64752535	chore: Bump version	2024-03-18 11:37:30 -04:00
Jeffrey Fong	8a60c7bc8c	fix: Fix and optimize functionary chat handler (#1282 ) * fix functionary chat logic * further fixes --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-03-18 10:40:57 -04:00
Andrei Betlen	8d298b4750	feat: Update llama.cpp	2024-03-18 10:26:36 -04:00
Andrei Betlen	6eb25231e4	feat: Update llama.cpp	2024-03-15 12:58:45 -04:00
Andrei Betlen	20e6815252	fix: json mode	2024-03-15 12:58:34 -04:00
Andrei Betlen	4084aabe86	fix: set default pooling type to unspecified	2024-03-14 10:04:57 -04:00
Andrei Betlen	d318cc8b83	fix: Set default pooling_type to mean, check for null pointer.	2024-03-14 09:17:41 -04:00
Andrei Betlen	dd0ee56217	feat: Update llama.cpp	2024-03-13 15:57:35 -04:00
Andrei Betlen	08e910f7a7	feat: Update llama.cpp	2024-03-10 23:45:05 -04:00
Andrei Betlen	a7281994d8	chore: Bump version	2024-03-08 21:14:44 -05:00
Andrei Betlen	919fca9f2b	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-03-08 21:10:56 -05:00
Andrei Betlen	d02a9cf16f	Fixed json strings grammar by blacklisting character control set. Closes #1259	2024-03-08 21:10:53 -05:00
Felipe Lorenz	c139f8b5d5	feat: Add endpoints for tokenize, detokenize and count tokens (#1136 ) * Add endpoint to count tokens * Add tokenize and detokenize endpoints * Change response key to tokens for tokenize endpoint * Fix dependency bug * Cleanup * Remove example added by mistake * Move tokenize, detokenize, and count to Extras namespace. Tag existing endpoints --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-03-08 21:09:00 -05:00
Kevin Cao	1f3156d4f2	fix: Check for existence of clip model path (#1264 )	2024-03-08 21:00:10 -05:00
Douglas Hanley	2811014bae	feat: Switch embed to llama_get_embeddings_seq (#1263 ) * switch to llama_get_embeddings_seq * Remove duplicate definition of llama_get_embeddings_seq Co-authored-by: Andrei <abetlen@gmail.com> --------- Co-authored-by: Andrei <abetlen@gmail.com>	2024-03-08 20:59:35 -05:00
Andrei Betlen	40c6b54f68	feat: Update llama.cpp	2024-03-08 20:58:50 -05:00
Andrei Betlen	93dc56ace8	Update llama.cpp	2024-03-06 01:32:00 -05:00
Andrei Betlen	87a6e5797e	feat: Update llama.cpp	2024-03-03 11:27:04 -05:00
Andrei Betlen	13177aae0f	chore: Bump version	2024-03-02 22:46:40 -05:00
Andrei Betlen	0e70984fb6	feat: Update llama.cpp	2024-03-02 22:20:04 -05:00
Andrei Betlen	d5df431278	chore: Bump version	2024-03-01 13:15:16 -05:00
Andrei Betlen	97aa3a153d	docs: Add information re: auto chat formats. Closes #1236	2024-03-01 13:10:25 -05:00
Andrei Betlen	f062a7f51d	feat: Update llama.cpp	2024-03-01 12:57:16 -05:00
Andrei Betlen	8c71725d53	fix: Remove deprecated cfg sampling functions	2024-02-28 14:37:07 -05:00
Andrei Betlen	727d60c28a	misc: Format	2024-02-28 14:27:40 -05:00
Andrei Betlen	0d37ce52b1	feat: Update llama.cpp	2024-02-28 14:27:16 -05:00
Andrei Betlen	ffcd4b2636	chore: Bump version	2024-02-28 01:38:32 -05:00
Sigbjørn Skjæret	c36ab15e68	fix: eos/bos_token set correctly for Jinja2ChatFormatter and automatic chat formatter (#1230 ) The token strings were not correctly retrieved (empty).	2024-02-28 01:30:31 -05:00
Andrei Betlen	fea33c9b94	feat: Update llama.cpp	2024-02-27 12:22:17 -05:00
Andrei	4d574bd765	feat(server): Add support for pulling models from Huggingface Hub (#1222 ) * Basic support for hf pull on server * Add hf_model_repo_id setting * Update README	2024-02-26 14:35:08 -05:00
Andrei Betlen	afe1e445c9	chore: Bump version	2024-02-26 11:43:24 -05:00
Andrei Betlen	9558ce7878	feat: Update llama.cpp	2024-02-26 11:40:58 -05:00
Andrei Betlen	dbaba3059d	fix: positional arguments only for low-level api	2024-02-26 11:31:11 -05:00
Andrei Betlen	78e536dcfe	fix: typo	2024-02-26 11:14:26 -05:00
Andrei Betlen	44558cbd7a	misc: llava_cpp use ctypes function decorator for binding	2024-02-26 11:07:33 -05:00
Andrei Betlen	8383a9e562	fix: llava this function takes at least 4 arguments (0 given)	2024-02-26 11:03:20 -05:00
Andrei Betlen	8e03fd9957	chore: Bump version	2024-02-25 21:15:42 -05:00
Andrei Betlen	dcf38f6141	fix: remove prematurely commited change	2024-02-25 21:00:37 -05:00

1 2 3 4 5 ...

823 commits