baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	8dfde63255	Fix return type	2023-05-07 19:30:14 -04:00
Andrei Betlen	2753b85321	Format	2023-05-07 13:19:56 -04:00
Andrei Betlen	627811ea83	Add verbose flag to server	2023-05-07 05:09:10 -04:00
Andrei Betlen	3fbda71790	Fix mlock_supported and mmap_supported return type	2023-05-07 03:04:22 -04:00
Andrei Betlen	5a3413eee3	Update cpu_count	2023-05-07 03:03:57 -04:00
Andrei Betlen	1a00e452ea	Update settings fields and defaults	2023-05-07 02:52:20 -04:00
Andrei Betlen	86753976c4	Revert "llama_cpp server: delete some ignored / unused parameters" This reverts commit `b47b9549d5`.	2023-05-07 02:02:34 -04:00
Andrei Betlen	c382d8f86a	Revert "llama_cpp server: mark model as required" This reverts commit `e40fcb0575`.	2023-05-07 02:00:22 -04:00
Andrei Betlen	d8fddcce73	Merge branch 'main' of github.com:abetlen/llama_cpp_python into better-server-params-and-fields	2023-05-07 01:54:00 -04:00
Andrei Betlen	7c3743fe5f	Update llama.cpp	2023-05-07 00:12:47 -04:00
Andrei Betlen	bc853e3742	Fix type for eval_logits in LlamaState object	2023-05-06 21:32:50 -04:00
Maximilian Winter	515d9bde7e	Fixed somethings and activated cublas	2023-05-06 23:40:19 +02:00
Maximilian Winter	aa203a0d65	Added mirostat sampling to the high level API.	2023-05-06 22:47:47 +02:00
Andrei Betlen	98bbd1c6a8	Fix eval logits type	2023-05-05 14:23:14 -04:00
Andrei Betlen	b5f3e74627	Add return type annotations for embeddings and logits	2023-05-05 14:22:55 -04:00
Andrei Betlen	3e28e0e50c	Fix: runtime type errors	2023-05-05 14:12:26 -04:00
Andrei Betlen	e24c3d7447	Prefer explicit imports	2023-05-05 14:05:31 -04:00
Andrei Betlen	40501435c1	Fix: types	2023-05-05 14:04:12 -04:00
Andrei Betlen	66e28eb548	Fix temperature bug	2023-05-05 14:00:41 -04:00
Andrei Betlen	6702d2abfd	Fix candidates type	2023-05-05 14:00:30 -04:00
Andrei Betlen	5e7ddfc3d6	Fix llama_cpp types	2023-05-05 13:54:22 -04:00
Andrei Betlen	b6a9a0b6ba	Add types for all low-level api functions	2023-05-05 12:22:27 -04:00
Andrei Betlen	5be0efa5f8	Cache should raise KeyError when key is missing	2023-05-05 12:21:49 -04:00
Andrei Betlen	24fc38754b	Add cli options to server. Closes #37	2023-05-05 12:08:28 -04:00
Andrei Betlen	853dc711cc	Format	2023-05-04 21:58:36 -04:00
Andrei Betlen	97c6372350	Rewind model to longest prefix.	2023-05-04 21:58:27 -04:00
Andrei Betlen	329297fafb	Bugfix: Missing logits_to_logprobs	2023-05-04 12:18:40 -04:00
Lucas Doyle	3008a954c1	Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields	2023-05-03 13:10:03 -07:00
Andrei Betlen	9e5b6d675a	Improve logging messages	2023-05-03 10:28:10 -04:00
Andrei Betlen	43f2907e3a	Support smaller state sizes	2023-05-03 09:33:50 -04:00
Andrei Betlen	1d47cce222	Update llama.cpp	2023-05-03 09:33:30 -04:00
Lucas Doyle	b9098b0ef7	llama_cpp server: prompt is a string Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion This was breaking types when generating an openapi client	2023-05-02 14:47:07 -07:00
Matt Hoffner	f97ff3c5bb	Update llama_cpp.py	2023-05-01 20:40:06 -07:00
Andrei	7ab08b8d10	Merge branch 'main' into better-server-params-and-fields	2023-05-01 22:45:57 -04:00
Andrei Betlen	9eafc4c49a	Refactor server to use factory	2023-05-01 22:38:46 -04:00
Andrei Betlen	dd9ad1c759	Formatting	2023-05-01 21:51:16 -04:00
Lucas Doyle	dbbfc4ba2f	llama_cpp server: fix to ChatCompletionRequestMessage When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage These fix that: - I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...] - Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"	2023-05-01 15:38:19 -07:00
Lucas Doyle	fa2a61e065	llama_cpp server: fields for the embedding endpoint	2023-05-01 15:38:19 -07:00
Lucas Doyle	8dcbf65a45	llama_cpp server: define fields for chat completions Slight refactor for common fields shared between completion and chat completion	2023-05-01 15:38:19 -07:00
Lucas Doyle	978b6daf93	llama_cpp server: add some more information to fields for completions	2023-05-01 15:38:19 -07:00
Lucas Doyle	a5aa6c1478	llama_cpp server: add missing top_k param to CreateChatCompletionRequest `llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it	2023-05-01 15:38:19 -07:00
Lucas Doyle	1e42913599	llama_cpp server: move logprobs to supported I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported	2023-05-01 15:38:19 -07:00
Lucas Doyle	b47b9549d5	llama_cpp server: delete some ignored / unused parameters `n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it	2023-05-01 15:38:19 -07:00
Lucas Doyle	e40fcb0575	llama_cpp server: mark model as required `model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility. I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.	2023-05-01 15:38:19 -07:00
Andrei Betlen	b6747f722e	Fix logprob calculation. Fixes #134	2023-05-01 17:45:08 -04:00
Andrei Betlen	9ff9cdd7fc	Fix import error	2023-05-01 15:11:15 -04:00
Andrei Betlen	350a1769e1	Update sampling api	2023-05-01 14:47:55 -04:00
Andrei Betlen	7837c3fdc7	Fix return types and import comments	2023-05-01 14:02:06 -04:00
Andrei Betlen	ccf1ed54ae	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-05-01 11:35:14 -04:00
Andrei Betlen	80184a286c	Update llama.cpp	2023-05-01 10:44:28 -04:00
Lucas Doyle	efe8e6f879	llama_cpp server: slight refactor to init_llama function Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py This allows the test to be less brittle by not needing to mess with os.environ, then importing the app	2023-04-29 11:42:23 -07:00
Lucas Doyle	6d8db9d017	tests: simple test for server module	2023-04-29 11:42:20 -07:00
Lucas Doyle	468377b0e2	llama_cpp server: app is now importable, still runnable as a module	2023-04-29 11:41:25 -07:00
Andrei	755f9fa455	Merge pull request #118 from SagsMug/main Fix UnicodeDecodeError permanently	2023-04-29 07:19:01 -04:00
Mug	18a0c10032	Remove excessive errors="ignore" and add utf8 test	2023-04-29 12:19:22 +02:00
Andrei Betlen	ea0faabae1	Update llama.cpp	2023-04-28 15:32:43 -04:00
Mug	b7d14efc8b	Python weirdness	2023-04-28 13:20:31 +02:00
Mug	eed61289b6	Dont detect off tokens, detect off detokenized utf8	2023-04-28 13:16:18 +02:00
Mug	3a98747026	One day, i'll fix off by 1 errors permanently too	2023-04-28 12:54:28 +02:00
Mug	c39547a986	Detect multi-byte responses and wait	2023-04-28 12:50:30 +02:00
Andrei Betlen	9339929f56	Update llama.cpp	2023-04-26 20:00:54 -04:00
Mug	5f81400fcb	Also ignore errors on input prompts	2023-04-26 14:45:51 +02:00
Mug	be2c961bc9	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python	2023-04-26 14:38:09 +02:00
Mug	c4a8491d42	Fix decode errors permanently	2023-04-26 14:37:06 +02:00
Andrei Betlen	cbd26fdcc1	Update llama.cpp	2023-04-25 19:03:41 -04:00
Andrei Betlen	3cab3ef4cb	Update n_batch for server	2023-04-25 09:11:32 -04:00
Andrei Betlen	cc706fb944	Add ctx check and re-order __init__. Closes #112	2023-04-25 09:00:53 -04:00
Andrei Betlen	d484c5634e	Bugfix: Check cache keys as prefix to prompt tokens	2023-04-24 22:18:54 -04:00
Andrei Betlen	cbe95bbb75	Add cache implementation using llama state	2023-04-24 19:54:41 -04:00
Andrei Betlen	2c359a28ff	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-04-24 17:51:27 -04:00
Andrei Betlen	197cf80601	Add save/load state api for Llama class	2023-04-24 17:51:25 -04:00
Andrei Betlen	86f8e5ad91	Refactor internal state for Llama class	2023-04-24 15:47:54 -04:00
Andrei	f37456133a	Merge pull request #108 from eiery/main Update n_batch default to 512 to match upstream llama.cpp	2023-04-24 13:48:09 -04:00
Andrei Betlen	02cf881317	Update llama.cpp	2023-04-24 09:30:10 -04:00
eiery	aa12d8a81f	Update llama.py update n_batch default to 512 to match upstream llama.cpp	2023-04-23 20:56:40 -04:00
Andrei Betlen	7230599593	Disable mmap when applying lora weights. Closes #107	2023-04-23 14:53:17 -04:00
Andrei Betlen	e99caedbbd	Update llama.cpp	2023-04-22 19:50:28 -04:00
Andrei Betlen	1eb130a6b2	Update llama.cpp	2023-04-21 17:40:27 -04:00
Andrei Betlen	e4647c75ec	Add use_mmap flag to server	2023-04-19 15:57:46 -04:00
Andrei Betlen	0df4d69c20	If lora base is not set avoid re-loading the model by passing NULL	2023-04-18 23:45:25 -04:00
Andrei Betlen	95c0dc134e	Update type signature to allow for null pointer to be passed.	2023-04-18 23:44:46 -04:00
Andrei Betlen	453e517fd5	Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights.	2023-04-18 10:20:46 -04:00
Andrei Betlen	eb7f278cc6	Add lora_path parameter to Llama model	2023-04-18 01:43:44 -04:00
Andrei Betlen	35abf89552	Add bindings for LoRA adapters. Closes #88	2023-04-18 01:30:04 -04:00
Andrei Betlen	89856ef00d	Bugfix: only eval new tokens	2023-04-15 17:32:53 -04:00
Andrei Betlen	92c077136d	Add experimental cache	2023-04-15 12:03:09 -04:00
Andrei Betlen	a6372a7ae5	Update stop sequences for chat	2023-04-15 12:02:48 -04:00
Andrei Betlen	83b2be6dc4	Update chat parameters	2023-04-15 11:58:43 -04:00
Andrei Betlen	62087514c6	Update chat prompt	2023-04-15 11:58:19 -04:00
Andrei Betlen	02f9fb82fb	Bugfix	2023-04-15 11:39:52 -04:00
Andrei Betlen	3cd67c7bd7	Add type annotations	2023-04-15 11:39:21 -04:00
Andrei Betlen	d7de0e8014	Bugfix	2023-04-15 00:08:04 -04:00
Andrei Betlen	e90e122f2a	Use clear	2023-04-14 23:33:18 -04:00
Andrei Betlen	ac7068a469	Track generated tokens internally	2023-04-14 23:33:00 -04:00
Andrei Betlen	6e298d8fca	Set kv cache size to f16 by default	2023-04-14 22:21:19 -04:00
Andrei Betlen	6c7cec0c65	Fix completion request	2023-04-14 10:01:15 -04:00
Andrei Betlen	6153baab2d	Clean up logprobs implementation	2023-04-14 09:59:33 -04:00
Andrei Betlen	26cc4ee029	Fix signature for stop parameter	2023-04-14 09:59:08 -04:00
Andrei Betlen	6595ad84bf	Add field to disable reseting between generations	2023-04-13 00:28:00 -04:00
Andrei Betlen	22fa5a621f	Revert "Deprecate generate method" This reverts commit `6cf5876538`.	2023-04-13 00:19:55 -04:00

1 2 3 4 5

228 commits