baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	d957422bf4	Implement sampling as in llama.cpp main example	2023-05-08 21:21:25 -04:00
Andrei Betlen	93a9019bb1	Merge branch 'main' of github.com:abetlen/llama_cpp_python into Maximilian-Winter/main	2023-05-08 19:57:09 -04:00
Andrei Betlen	65d9cc050c	Add openai frequency and presence penalty parameters. Closes #169	2023-05-08 01:30:18 -04:00
Andrei Betlen	e72f58614b	Change pointer to lower overhead byref	2023-05-07 20:01:34 -04:00
Andrei Betlen	0e94a70de1	Add in-memory longest prefix cache. Closes #158	2023-05-07 19:31:26 -04:00
Andrei Betlen	2753b85321	Format	2023-05-07 13:19:56 -04:00
Andrei Betlen	7c3743fe5f	Update llama.cpp	2023-05-07 00:12:47 -04:00
Andrei Betlen	bc853e3742	Fix type for eval_logits in LlamaState object	2023-05-06 21:32:50 -04:00
Maximilian Winter	515d9bde7e	Fixed somethings and activated cublas	2023-05-06 23:40:19 +02:00
Maximilian Winter	aa203a0d65	Added mirostat sampling to the high level API.	2023-05-06 22:47:47 +02:00
Andrei Betlen	98bbd1c6a8	Fix eval logits type	2023-05-05 14:23:14 -04:00
Andrei Betlen	66e28eb548	Fix temperature bug	2023-05-05 14:00:41 -04:00
Andrei Betlen	b6a9a0b6ba	Add types for all low-level api functions	2023-05-05 12:22:27 -04:00
Andrei Betlen	5be0efa5f8	Cache should raise KeyError when key is missing	2023-05-05 12:21:49 -04:00
Andrei Betlen	853dc711cc	Format	2023-05-04 21:58:36 -04:00
Andrei Betlen	97c6372350	Rewind model to longest prefix.	2023-05-04 21:58:27 -04:00
Andrei Betlen	329297fafb	Bugfix: Missing logits_to_logprobs	2023-05-04 12:18:40 -04:00
Andrei Betlen	9e5b6d675a	Improve logging messages	2023-05-03 10:28:10 -04:00
Andrei Betlen	43f2907e3a	Support smaller state sizes	2023-05-03 09:33:50 -04:00
Andrei Betlen	dd9ad1c759	Formatting	2023-05-01 21:51:16 -04:00
Andrei Betlen	b6747f722e	Fix logprob calculation. Fixes #134	2023-05-01 17:45:08 -04:00
Andrei Betlen	350a1769e1	Update sampling api	2023-05-01 14:47:55 -04:00
Mug	18a0c10032	Remove excessive errors="ignore" and add utf8 test	2023-04-29 12:19:22 +02:00
Mug	b7d14efc8b	Python weirdness	2023-04-28 13:20:31 +02:00
Mug	eed61289b6	Dont detect off tokens, detect off detokenized utf8	2023-04-28 13:16:18 +02:00
Mug	3a98747026	One day, i'll fix off by 1 errors permanently too	2023-04-28 12:54:28 +02:00
Mug	c39547a986	Detect multi-byte responses and wait	2023-04-28 12:50:30 +02:00
Mug	5f81400fcb	Also ignore errors on input prompts	2023-04-26 14:45:51 +02:00
Mug	be2c961bc9	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python	2023-04-26 14:38:09 +02:00
Mug	c4a8491d42	Fix decode errors permanently	2023-04-26 14:37:06 +02:00
Andrei Betlen	cc706fb944	Add ctx check and re-order __init__. Closes #112	2023-04-25 09:00:53 -04:00
Andrei Betlen	d484c5634e	Bugfix: Check cache keys as prefix to prompt tokens	2023-04-24 22:18:54 -04:00
Andrei Betlen	cbe95bbb75	Add cache implementation using llama state	2023-04-24 19:54:41 -04:00
Andrei Betlen	2c359a28ff	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-04-24 17:51:27 -04:00
Andrei Betlen	197cf80601	Add save/load state api for Llama class	2023-04-24 17:51:25 -04:00
Andrei Betlen	86f8e5ad91	Refactor internal state for Llama class	2023-04-24 15:47:54 -04:00
Andrei	f37456133a	Merge pull request #108 from eiery/main Update n_batch default to 512 to match upstream llama.cpp	2023-04-24 13:48:09 -04:00
eiery	aa12d8a81f	Update llama.py update n_batch default to 512 to match upstream llama.cpp	2023-04-23 20:56:40 -04:00
Andrei Betlen	7230599593	Disable mmap when applying lora weights. Closes #107	2023-04-23 14:53:17 -04:00
Andrei Betlen	0df4d69c20	If lora base is not set avoid re-loading the model by passing NULL	2023-04-18 23:45:25 -04:00
Andrei Betlen	453e517fd5	Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights.	2023-04-18 10:20:46 -04:00
Andrei Betlen	eb7f278cc6	Add lora_path parameter to Llama model	2023-04-18 01:43:44 -04:00
Andrei Betlen	89856ef00d	Bugfix: only eval new tokens	2023-04-15 17:32:53 -04:00
Andrei Betlen	92c077136d	Add experimental cache	2023-04-15 12:03:09 -04:00
Andrei Betlen	a6372a7ae5	Update stop sequences for chat	2023-04-15 12:02:48 -04:00
Andrei Betlen	83b2be6dc4	Update chat parameters	2023-04-15 11:58:43 -04:00
Andrei Betlen	62087514c6	Update chat prompt	2023-04-15 11:58:19 -04:00
Andrei Betlen	02f9fb82fb	Bugfix	2023-04-15 11:39:52 -04:00
Andrei Betlen	3cd67c7bd7	Add type annotations	2023-04-15 11:39:21 -04:00
Andrei Betlen	d7de0e8014	Bugfix	2023-04-15 00:08:04 -04:00

1 2

97 commits