Commit graph

3506 commits

Author SHA1 Message Date
Michael Yang
bd6e38fb1a refactor memory check 2023-10-13 14:47:29 -07:00
Michael Yang
92189a5855 fix memory check 2023-10-13 14:47:29 -07:00
Michael Yang
d790bf9916
Merge pull request #783 from jmorganca/mxyng/fix-gpu-offloading
fix: offloading on low end GPUs
2023-10-13 14:36:44 -07:00
Michael Yang
35afac099a do not use gpu binary when num_gpu == 0 2023-10-13 14:32:12 -07:00
Michael Yang
811c3d1900 no gpu if vram < 2GB 2023-10-13 14:32:12 -07:00
Bruce MacDonald
3553d10769
check for newer updates (#784)
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-10-13 17:29:46 -04:00
Bruce MacDonald
6fe178134d
improve api error handling (#781)
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Jeffrey Morgan
d890890f66 use lower glibc versions in Dockerfile.build 2023-10-13 01:06:19 -04:00
Jeffrey Morgan
89ba19feca use Go 1.21.3 in Dockerfile 2023-10-12 23:23:12 -04:00
Jeffrey Morgan
6f58c77671 update Dockerfile.build for linux binary builds 2023-10-12 22:14:20 -04:00
Matt Williams
3c975f898f update doc to refer to docker image
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-12 15:57:50 -07:00
Matt Williams
9245c8a1df add how to quantize doc
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-12 15:34:57 -07:00
Michael Yang
7a537cdca9
Merge pull request #770 from jmorganca/mxyng/fix-download
fix download
2023-10-12 12:56:43 -07:00
Michael Yang
257ffeb997 fix download 2023-10-12 12:52:43 -07:00
Matt Williams
9b513bb6b1
Merge pull request #753 from jmorganca/mattw/examplereorg
rename the examples to be more descriptive
2023-10-12 11:24:12 -07:00
Matt Williams
042100f797 final rename
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-12 11:23:41 -07:00
Bruce MacDonald
7804b8fab9
validate api options fields from map (#711) 2023-10-12 11:18:11 -04:00
Bruce MacDonald
56497663c8
relay model runner error message to client (#720)
* give direction to user when runner fails
* also relay errors from timeout
* increase timeout to 3 minutes
2023-10-12 11:16:37 -04:00
Matt Williams
e1afcb8af2 simple gen to simple
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 21:29:07 -07:00
Matt Williams
385eeea357 remove with
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 21:26:11 -07:00
Matt Williams
8a41b244e8 add golang gen
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 21:20:50 -07:00
Jeffrey Morgan
92578798bb fix relative links in README.md 2023-10-11 19:24:06 -04:00
Michael Yang
788637918a
Merge pull request #760 from jmorganca/mxyng/more-downloads
Mxyng/more downloads
2023-10-11 14:33:10 -07:00
Michael Yang
c413a55093 download: handle inner errors 2023-10-11 14:15:30 -07:00
Michael Yang
630bb75d2a dynamically size download parts based on file size 2023-10-11 14:10:25 -07:00
Michael Yang
a2055a1e93 update download 2023-10-11 14:10:25 -07:00
Michael Yang
b599946b74 add format bytes 2023-10-11 14:08:23 -07:00
Michael Yang
aca2d65b82
Merge pull request #757 from jmorganca/mxyng/format-time
cleanup format time
2023-10-11 11:12:29 -07:00
Michael Yang
b5e08e3373 cleanup format time 2023-10-11 11:09:27 -07:00
Bruce MacDonald
274d5a5fdf
optional parameter to not stream response (#639)
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Matt Williams
fc6b49be32 add ts alternate to python langchain simplegen
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 09:50:15 -07:00
Bruce MacDonald
77295f716e
prevent waiting on exited command (#752)
* prevent waiting on exited command
* close llama runner once
2023-10-11 12:32:13 -04:00
Matt Williams
615f7d1dea cleanup readme.
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 06:13:29 -07:00
Matt Williams
cdf5e106ae rename dirs
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-11 06:10:24 -07:00
Matt Williams
a85329f59a rename the models to be more descriptive
Signed-off-by: Matt Williams <m@technovangelist.com>
2023-10-10 17:40:02 -07:00
Bruce MacDonald
f2ba1311aa
improve vram safety with 5% vram memory buffer (#724)
* check free memory not total
* wait for subprocess to exit
2023-10-10 16:16:09 -04:00
Jeffrey Morgan
65dcd0ce35
always cleanup blob download (#747) 2023-10-10 13:12:29 -04:00
Michael Yang
0040f543a2
Merge pull request #743 from jmorganca/mxyng/http-proxy
handle upstream proxies
2023-10-10 09:59:06 -07:00
Matt Williams
767f9bdbbb
Merge pull request #585 from jmorganca/matt/examplementors
add the example for ask the mentors
2023-10-09 13:58:14 -07:00
Costa Alexoglou
f7f5169c94
Update api.md (#741)
Avoid triple ticks in visual editor and also copied in clipboard.
2023-10-09 16:01:46 -04:00
Michael Yang
2cfffea02e handle client proxy 2023-10-09 12:33:47 -07:00
Michael Yang
f6e98334e4 handle upstream proxies 2023-10-09 11:42:36 -07:00
Jeffrey Morgan
ab0668293c llm: fix build on amd64 2023-10-06 14:39:54 -07:00
Bruce MacDonald
af4cf55884
not found error before pulling model (#718) 2023-10-06 16:06:20 -04:00
Bruce MacDonald
d6786f2945
add feedback for reading model metadata (#722) 2023-10-06 16:05:32 -04:00
Michael Yang
38dc2f79bc
Merge pull request #626 from jmorganca/mxyng/concurrent-downloads
parallel chunked downloads
2023-10-06 13:01:29 -07:00
Michael Yang
cb961c87ca
Merge pull request #679 from jamesbraza/modelfile-docs
`Modelfile` syntax highlighting
2023-10-06 12:59:45 -07:00
Michael Yang
0560b28a8d names 2023-10-06 12:56:56 -07:00
Michael Yang
10199c5987 replace done channel with file check 2023-10-06 12:56:56 -07:00
Michael Yang
288814d3e4 fix ref counts 2023-10-06 12:56:43 -07:00