baalajimaestro/llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	85ead98a3e	Update Functions notebook example	2023-11-10 12:49:14 -05:00
Andrei Betlen	1b376c62b7	Update functionary for new OpenAI API	2023-11-10 02:51:58 -05:00
Andrei Betlen	598780fde8	Update Multimodal notebook	2023-11-08 00:48:25 -05:00
Damian Stewart	aab74f0b2b	Multimodal Support (Llava 1.5) (#821 ) * llava v1.5 integration * Point llama.cpp to fork * Add llava shared library target * Fix type * Update llama.cpp * Add llava api * Revert changes to llama and llama_cpp * Update llava example * Add types for new gpt-4-vision-preview api * Fix typo * Update llama.cpp * Update llama_types to match OpenAI v1 API * Update ChatCompletionFunction type * Reorder request parameters * More API type fixes * Even More Type Updates * Add parameter for custom chat_handler to Llama class * Fix circular import * Convert to absolute imports * Fix * Fix pydantic Jsontype bug * Accept list of prompt tokens in create_completion * Add llava1.5 chat handler * Add Multimodal notebook * Clean up examples * Add server docs --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-11-07 22:48:51 -05:00
Andrei	3af7b21ff1	Add functionary support (#784 ) * Add common grammars and json-schema-to-grammar utility function from llama.cpp * Pass functions to format function * Add basic functionary formatting * Add LlamaChatHandler for more complex chat use cases * Add function calling example notebook * Add support for regular chat completions alongside function calling	2023-11-03 02:12:14 -04:00
Andrei	ab028cb878	Migrate inference to llama_batch and llama_decode api (#795 ) * Add low-level batching notebook * fix: tokenization of special characters: (#850) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly * Update CHANGELOG * Cleanup * Fix runner label * Update notebook * Use llama_decode and batch api * Support logits_all parameter --------- Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>	2023-11-02 20:13:57 -04:00
Andrei Betlen	f4090a0bb2	Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs.	2023-09-13 23:00:43 -04:00
Juarez Bochi	20ac434d0f	Fix low level api examples	2023-09-07 17:50:47 -04:00
Andrei	2adf6f3f9a	Merge pull request #265 from dmahurin/fix-from-bytes-byteorder fix "from_bytes() missing required argument 'byteorder'"	2023-05-26 12:53:06 -04:00
Andrei	34ad71f448	Merge pull request #274 from dmahurin/fix-missing-antiprompt low_level_api_chat_cpp.py: Fix missing antiprompt output in chat.	2023-05-26 12:52:34 -04:00
Don Mahurin	0fa2ec4903	low_level_api_chat_cpp.py: Fix missing antiprompt output in chat.	2023-05-26 06:54:28 -07:00
Don Mahurin	d6a7adb17a	fix "missing 1 required positional argument: 'min_keep'"	2023-05-23 06:42:22 -07:00
Don Mahurin	327eedbfe1	fix "from_bytes() missing required argument 'byteorder'"	2023-05-23 00:20:34 -07:00
Andrei Betlen	c7788c85ab	Add Guidance example	2023-05-19 03:16:58 -04:00
Andrei	7499fc1cbb	Merge pull request #126 from Stonelinks/deprecate-example-server Deprecate example server	2023-05-08 19:29:04 -04:00
Mug	eaf9f19aa9	Fix lora	2023-05-08 15:27:42 +02:00
Mug	2c0d9b182c	Fix session loading and saving in low level example chat	2023-05-08 15:27:03 +02:00
Mug	fd80ddf703	Fix a bug with wrong type	2023-05-06 22:22:28 +02:00
Mug	996f63e9e1	Add utf8 to chat example	2023-05-06 15:16:58 +02:00
Mug	3ceb47b597	Fix mirastat requiring c_float	2023-05-06 13:35:50 +02:00
Mug	9797394c81	Wrong logit_bias parsed type	2023-05-06 13:27:52 +02:00
Mug	1895c11033	Rename postfix to suffix to match upstream	2023-05-06 13:18:25 +02:00
Mug	0e9f227afd	Update low level examples	2023-05-04 18:33:08 +02:00
Lucas Doyle	0fcc25cdac	examples fastapi_server: deprecate This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more. Rationale: Currently there exist two server implementations in this repo: - `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server` - `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff. The one in the module seems better: - supports logits_all - supports use_mmap - has experimental cache support (with some mutex thing going on) - some stuff with streaming support was moved around more recently than fastapi_server.py	2023-05-01 22:34:23 -07:00
Mug	c39547a986	Detect multi-byte responses and wait	2023-04-28 12:50:30 +02:00
Mug	5f81400fcb	Also ignore errors on input prompts	2023-04-26 14:45:51 +02:00
Mug	3c130f00ca	Remove try catch from chat	2023-04-26 14:38:53 +02:00
Mug	c4a8491d42	Fix decode errors permanently	2023-04-26 14:37:06 +02:00
Mug	53d17ad003	Fixed end of text wrong type, and fix n_predict behaviour	2023-04-17 14:45:28 +02:00
Mug	3bb45f1658	More reasonable defaults	2023-04-10 16:38:45 +02:00
Mug	0cccb41a8f	Added iterative search to prevent instructions from being echoed, add ignore eos, add no-mmap, fixed 1 character echo too much bug	2023-04-10 16:35:38 +02:00
Andrei Betlen	196650ccb2	Update model paths to be more clear they should point to file	2023-04-09 22:45:55 -04:00
Andrei Betlen	6d1bda443e	Add clients example. Closes #46	2023-04-08 09:35:32 -04:00
Andrei	41365b0456	Merge pull request #15 from SagsMug/main llama.cpp chat example implementation	2023-04-07 20:43:33 -04:00
Mug	16fc5b5d23	More interoperability to the original llama.cpp, and arguments now work	2023-04-07 13:32:19 +02:00
Mug	10c7571117	Fixed too many newlines, now onto args. Still needs shipping work so you could do "python -m llama_cpp.examples." etc.	2023-04-06 15:33:22 +02:00
Mug	085cc92b1f	Better llama.cpp interoperability Has some too many newline issues so WIP	2023-04-06 15:30:57 +02:00
MillionthOdin16	c283edd7f2	Set n_batch to default values and reduce thread count: Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default. Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%	2023-04-05 18:17:29 -04:00
Andrei Betlen	e1b5b9bb04	Update fastapi server example	2023-04-05 14:44:26 -04:00
Mug	283e59c5e9	Fix bug in init_break not being set when exited via antiprompt and others.	2023-04-05 14:47:24 +02:00
Mug	99ceecfccd	Move to new examples directory	2023-04-05 14:28:02 +02:00
Mug	e4c6f34d95	Merge branch 'main' of https://github.com/abetlen/llama-cpp-python	2023-04-05 14:18:27 +02:00
Andrei Betlen	b1babcf56c	Add quantize example	2023-04-05 04:17:26 -04:00
Andrei Betlen	c8e13a78d0	Re-organize examples folder	2023-04-05 04:10:13 -04:00
Andrei Betlen	c16bda5fb9	Add performance tuning notebook	2023-04-05 04:09:19 -04:00
Mug	c862e8bac5	Fix repeating instructions and an antiprompt bug	2023-04-04 17:54:47 +02:00
Mug	9cde7973cc	Fix stripping instruction prompt	2023-04-04 16:20:27 +02:00
Mug	da5a6a7089	Added instruction mode, fixed infinite generation, and various other fixes	2023-04-04 16:18:26 +02:00
Mug	0b32bb3d43	Add instruction mode	2023-04-04 11:48:48 +02:00
Andrei Betlen	ffe34cf64d	Allow user to set llama config from env vars	2023-04-04 00:52:44 -04:00

1 2

72 commits