ollama

Author	SHA1	Message	Date
Blake Mizerany	cb42e607c5	llm: speed up gguf decoding by a lot (#5246 ) Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.	2024-06-24 21:47:52 -07:00
Michael Yang	171eb040fc	simplify safetensors reading	2024-05-21 11:28:22 -07:00
Michael Yang	8b2c10061c	refactor tensor query	2024-04-10 11:37:20 -07:00
Michael Yang	d338d70492	refactor model parsing	2024-04-01 13:16:15 -07:00
Patrick Devine	5a5efee46b	Add gemma safetensors conversion (#3250 ) Co-authored-by: Michael Yang <mxyng@pm.me>	2024-03-28 18:54:01 -07:00
Michael Yang	0085297928	refactor readseeker	2024-03-12 12:54:18 -07:00
Michael Yang	76bdebbadf	decode ggla	2024-03-08 15:46:25 -08:00