baalajimaestro/llama.cpp

Author	SHA1	Message	Date
MillionthOdin16	c283edd7f2	Set n_batch to default values and reduce thread count: Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default. Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%	2023-04-05 18:17:29 -04:00
Andrei Betlen	e1b5b9bb04	Update fastapi server example	2023-04-05 14:44:26 -04:00
Andrei Betlen	b1babcf56c	Add quantize example	2023-04-05 04:17:26 -04:00
Andrei Betlen	c8e13a78d0	Re-organize examples folder	2023-04-05 04:10:13 -04:00
Andrei Betlen	c16bda5fb9	Add performance tuning notebook	2023-04-05 04:09:19 -04:00
Andrei Betlen	ffe34cf64d	Allow user to set llama config from env vars	2023-04-04 00:52:44 -04:00
Andrei Betlen	05eb2087d8	Small fixes for examples	2023-04-03 20:33:07 -04:00
Andrei Betlen	7fedf16531	Add support for chat completion	2023-04-03 20:12:44 -04:00
Andrei Betlen	f7ab8d55b2	Update context size defaults Close #11	2023-04-03 20:11:13 -04:00
Andrei Betlen	caff127836	Remove commented out code	2023-04-01 15:13:01 -04:00
Andrei Betlen	f28bf3f13d	Bugfix: enable embeddings for fastapi server	2023-04-01 15:12:25 -04:00
Andrei Betlen	ed6f2a049e	Add streaming and embedding endpoints to fastapi example	2023-04-01 13:05:20 -04:00
Andrei Betlen	9fac0334b2	Update embedding example to new api	2023-04-01 13:02:51 -04:00
Andrei Betlen	5e011145c5	Update low level api example	2023-04-01 13:02:10 -04:00
Andrei Betlen	5f2e822b59	Rename inference example	2023-04-01 13:01:45 -04:00
Andrei Betlen	70b8a1ef75	Add support to get embeddings from high-level api. Closes #4	2023-03-28 04:59:54 -04:00
Andrei Betlen	3dbb3fd3f6	Add support for stream parameter. Closes #1	2023-03-28 04:03:57 -04:00
Andrei Betlen	dfe8608096	Update examples	2023-03-24 19:10:31 -04:00
Andrei Betlen	a61fd3b509	Add example based on stripped down version of main.cpp from llama.cpp	2023-03-24 18:57:25 -04:00
Andrei Betlen	2cc499512c	Black formatting	2023-03-24 14:35:41 -04:00
Andrei Betlen	d29b05bb67	Update example to match alpaca training prompt	2023-03-24 14:34:15 -04:00
Andrei Betlen	15e3dc7897	Add fastapi example	2023-03-24 01:41:24 -04:00
Andrei Betlen	9af16b63fd	Added low-level api inference example	2023-03-23 23:45:59 -04:00
Andrei Betlen	8680332203	Update examples	2023-03-23 23:12:42 -04:00
Andrei Betlen	90c78723de	Add basic langchain demo	2023-03-23 16:25:24 -04:00
Andrei Betlen	3d6eb32c76	Update basic example	2023-03-23 14:57:31 -04:00
Andrei Betlen	79b304c9d4	Initial commit	2023-03-23 05:33:06 -04:00

27 commits