baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	159cc4e5d9	feat: Update llama.cpp	2024-04-21 20:46:40 -04:00
Andrei Betlen	0281214863	chore: Bump version	2024-04-20 00:09:37 -04:00
Andrei Betlen	cc81afebf0	feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct	2024-04-20 00:00:53 -04:00
Andrei Betlen	d17c1887a3	feat: Update llama.cpp	2024-04-19 23:58:16 -04:00
Andrei Betlen	893a27a736	chore: Bump version	2024-04-18 01:43:39 -04:00
Andrei Betlen	a128c80500	feat: Update llama.cpp	2024-04-18 01:39:45 -04:00
Lucca Zenóbio	4f42664955	feat: update grammar schema converter to match llama.cpp (#1353 ) * feat: improve function calling * feat:grammar * fix * fix * fix	2024-04-18 01:36:25 -04:00
Andrei Betlen	fa4bb0cf81	Revert "feat: Update json to grammar (#1350 )" This reverts commit `610a592f70`.	2024-04-17 16:18:16 -04:00
Lucca Zenóbio	610a592f70	feat: Update json to grammar (#1350 ) * feat: improve function calling * feat:grammar	2024-04-17 10:10:21 -04:00
khimaros	b73c73c0c6	feat: add `disable_ping_events` flag (#1257 ) for backward compatibility, this is false by default it can be set to true to disable EventSource pings which are not supported by some OpenAI clients. fixes https://github.com/abetlen/llama-cpp-python/issues/1256	2024-04-17 10:08:19 -04:00
tc-wolf	4924455dec	feat: Make saved state more compact on-disk (#1296 ) * State load/save changes - Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)` sized array. - Difference between ~350MB and ~1500MB for example prompt with ~300 tokens (makes sense lol) - Auto-formatting changes * Back out formatting changes	2024-04-17 10:06:50 -04:00
Andrei Betlen	9842cbf99d	feat: Update llama.cpp	2024-04-17 10:06:15 -04:00
ddh0	c96b2daebf	feat: Use all available CPUs for batch processing (#1345 )	2024-04-17 10:05:54 -04:00
Andrei Betlen	a420f9608b	feat: Update llama.cpp	2024-04-14 19:14:09 -04:00
Andrei Betlen	90dceaba8a	feat: Update llama.cpp	2024-04-14 11:35:57 -04:00
Andrei Betlen	2e9ffd28fd	feat: Update llama.cpp	2024-04-12 21:09:12 -04:00
Andrei Betlen	ef29235d45	chore: Bump version	2024-04-10 03:44:46 -04:00
Andrei Betlen	bb65b4d764	fix: pass correct type to chat handlers for chat completion logprobs	2024-04-10 03:41:55 -04:00
Andrei Betlen	060bfa64d5	feat: Add support for yaml based configs	2024-04-10 02:47:01 -04:00
Andrei Betlen	1347e1d050	feat: Add typechecking for ctypes structure attributes	2024-04-10 02:40:41 -04:00
Andrei Betlen	889d0e8981	feat: Update llama.cpp	2024-04-10 02:25:58 -04:00
Andrei Betlen	56071c956a	feat: Update llama.cpp	2024-04-09 09:53:49 -04:00
Andrei Betlen	08b16afe11	chore: Bump version	2024-04-06 01:53:38 -04:00
Andrei Betlen	7ca364c8bd	feat: Update llama.cpp	2024-04-06 01:37:43 -04:00
Andrei Betlen	b3bfea6dbf	fix: Always embed metal library. Closes #1332	2024-04-06 01:36:53 -04:00
Andrei Betlen	f4092e6b46	feat: Update llama.cpp	2024-04-05 10:59:31 -04:00
Andrei Betlen	2760ef6156	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-04-05 10:51:54 -04:00
Andrei Betlen	1ae3abbcc3	fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314	2024-04-05 10:51:44 -04:00
Andrei Betlen	49bc66bfa2	fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314	2024-04-05 10:50:49 -04:00
Andrei Betlen	9111b6e03a	feat: Update llama.cpp	2024-04-05 09:21:02 -04:00
Sigbjørn Skjæret	7265a5dc0e	fix(docs): incorrect tool_choice example (#1330 )	2024-04-05 09:14:03 -04:00
Andrei Betlen	909ef66951	docs: Rename cuBLAS section to CUDA	2024-04-04 03:08:47 -04:00
Andrei Betlen	1db3b58fdc	docs: Add docs explaining how to install pre-built wheels.	2024-04-04 02:57:06 -04:00
Andrei Betlen	c50309e52a	docs: LLAMA_CUBLAS -> LLAMA_CUDA	2024-04-04 02:49:19 -04:00
Andrei Betlen	612e78d322	fix(ci): use correct script name	2024-04-03 16:15:29 -04:00
Andrei Betlen	34081ddc5b	chore: Bump version	2024-04-03 15:38:27 -04:00
Andrei Betlen	368061c04a	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-04-03 15:35:30 -04:00
Andrei Betlen	5a5193636b	feat: Update llama.cpp	2024-04-03 15:35:28 -04:00
Andrei	5a930ee9a1	feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247 ) * Generate binary wheel index on release * Add total release downloads badge * Update download label * Use official cibuildwheel action * Add workflows to build CUDA and Metal wheels * Update generate index workflow * Update workflow name	2024-04-03 15:32:13 -04:00
Andrei Betlen	8649d7671b	fix: segfault when logits_all=False. Closes #1319	2024-04-03 15:30:31 -04:00
Andrei Betlen	f96de6d920	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main	2024-04-03 00:55:21 -04:00
Andrei Betlen	e465157804	feat: Update llama.cpp	2024-04-03 00:55:19 -04:00
Yuri Mikhailov	62aad610e1	fix: last tokens passing to sample_repetition_penalties function (#1295 ) Co-authored-by: ymikhaylov <ymikhaylov@x5.ru> Co-authored-by: Andrei <abetlen@gmail.com>	2024-04-01 15:25:43 -04:00
Andrei Betlen	45bf5ae582	chore: Bump version	2024-04-01 10:28:22 -04:00
lawfordp2017	a0f373e310	fix: Changed local API doc references to hosted (#1317 )	2024-04-01 10:21:00 -04:00
Limour	f165048a69	feat: add support for KV cache quantization options (#1307 ) * add KV cache quantization options https://github.com/abetlen/llama-cpp-python/discussions/1220 https://github.com/abetlen/llama-cpp-python/issues/1305 * Add ggml_type * Use ggml_type instead of string for quantization * Add server support --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-04-01 10:19:28 -04:00
windspirit95	aa9f1ae011	feat: Add logprobs support to chat completions (#1311 ) * Add logprobs return in ChatCompletionResponse * Fix duplicate field * Set default to false * Simplify check * Add server example --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2024-03-31 13:30:13 -04:00
Andrei Betlen	1e60dba082	feat: Update llama.cpp	2024-03-29 13:34:23 -04:00
Andrei Betlen	dcbe57fcf8	feat: Update llama.cpp	2024-03-29 12:45:27 -04:00
Andrei Betlen	125b2358c9	feat: Update llama.cpp	2024-03-28 12:06:46 -04:00

1 2 3 4 5 ...

1698 commits