Bruce MacDonald
fe6f3b48f7
do not reload the running llm when runtime params change ( #840 )
...
- only reload the running llm if the model has changed, or the options for loading the running model have changed
- rename loaded llm to runner to differentiate from loaded model image
- remove logic which keeps the first system prompt in the generation context
2023-10-19 10:39:58 -04:00
Bruce MacDonald
6fe178134d
improve api error handling ( #781 )
...
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Bruce MacDonald
7804b8fab9
validate api options fields from map ( #711 )
2023-10-12 11:18:11 -04:00
Bruce MacDonald
274d5a5fdf
optional parameter to not stream response ( #639 )
...
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Bruce MacDonald
2130c0708b
output type parsed from modelfile ( #678 )
2023-10-05 14:58:04 -04:00
Bruce MacDonald
1fbf3585d6
Relay default values to llama runner ( #672 )
...
* include seed in params for llama.cpp server and remove empty filter for temp
* relay default predict options to llama.cpp
- reorganize options to match predict request for readability
* omit empty stop
---------
Co-authored-by: hallh <hallh@users.noreply.github.com>
2023-10-02 14:53:16 -04:00
Bruce MacDonald
a1b2d95f96
remove unused push/pull params ( #650 )
2023-09-29 17:27:19 -04:00
Michael Yang
f40b3de758
use int64 consistently
2023-09-28 11:07:24 -07:00
Bruce MacDonald
f221637053
first pass at linux gpu support ( #454 )
...
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488 )
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-12 11:04:35 -04:00
Patrick Devine
790d24eb7b
add show command ( #474 )
2023-09-06 11:04:17 -07:00
Michael Yang
0f541a0367
s/ListResponseModel/ModelResponse/
2023-08-31 09:47:10 -04:00
Bruce MacDonald
42998d797d
subprocess llama.cpp server ( #401 )
...
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Patrick Devine
8bbff2df98
add model IDs ( #439 )
2023-08-28 20:50:24 -07:00
Michael Yang
f723bf0879
ignore nil map values
2023-08-17 15:50:46 -07:00
Michael Yang
f27bc261cf
s/parmeter/parameter/
2023-08-10 16:26:06 -07:00
Michael Yang
81d8d7b73f
fix could not convert int
2023-08-10 16:24:17 -07:00
Patrick Devine
be989d89d1
Token auth ( #314 )
2023-08-10 11:34:25 -07:00
Bruce MacDonald
4b3507f036
embeddings endpoint
...
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-10 11:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile
2023-08-09 10:26:19 -04:00
Bruce MacDonald
21ddcaa1f1
pr comments
...
- default to embeddings enabled
- move embedding logic for loaded model to request
- allow embedding full directory
- close llm on reload
2023-08-08 13:49:37 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
...
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Jeffrey Morgan
8713ac23a8
allow overriding template
and system
in /api/generate
...
Fixes #297
Fixes #296
2023-08-08 00:55:34 -04:00
Michael Yang
4dc5b117dd
automatically set num_keep if num_keep < 0
...
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Michael Yang
b9f4d67554
configurable rope frequency parameters
2023-08-03 22:11:58 -07:00
Bruce MacDonald
1c5a8770ee
read runner parameter options from map
...
- read runner options from map to see what was specified explicitly and overwrite zero values
2023-08-01 13:38:19 -04:00
Jeffrey Morgan
528bafa585
cache loaded model
2023-08-01 11:24:18 -04:00
Bruce MacDonald
184ad8f057
allow specifying stop conditions in modelfile
2023-07-28 11:02:04 -04:00
Jeffrey Morgan
822a0e36eb
lower batch size to 512
2023-07-28 10:56:21 -04:00
Michael Yang
fadf75f99d
add stop conditions
2023-07-27 17:00:47 -07:00
Michael Yang
ad3a7d0e2c
add NumGQA
2023-07-27 14:05:11 -07:00
Jeffrey Morgan
688661ab9b
increase default batch size to 1024
2023-07-27 16:51:01 -04:00
Michael Yang
cca61181cb
sample metrics
2023-07-27 09:31:44 -07:00
Michael Yang
c490416189
lock on llm.lock(); decrease batch size
2023-07-27 09:31:44 -07:00
Michael Yang
f62a882760
add session expiration
2023-07-27 09:31:44 -07:00
Michael Yang
3003fc03fc
update predict code
2023-07-27 09:31:44 -07:00
Michael Yang
32aec66e6a
add load duration
2023-07-27 09:31:44 -07:00
Michael Yang
35af37a2cb
session id
2023-07-27 09:31:44 -07:00
Bruce MacDonald
4c1caa3733
download models when creating from modelfile
2023-07-25 14:25:13 -04:00
Patrick Devine
4cb42ca55e
add copy command ( #191 )
2023-07-24 11:27:28 -04:00
Patrick Devine
9f6e97865c
allow pushing/pulling to insecure registries ( #157 )
2023-07-21 15:42:19 -07:00
Bruce MacDonald
7ba1308595
Merge pull request #147 from jmorganca/brucemacd/cli-err-display
...
Improve CLI error display
2023-07-21 16:10:19 +02:00
Patrick Devine
e7a393de54
add rm command for models ( #151 )
2023-07-20 16:09:23 -07:00
Bruce MacDonald
ebaa33ac28
display gin api errors in cli
2023-07-20 20:45:12 +02:00
Michael Yang
68df36ae50
fix pull 0 bytes on completed layer
2023-07-18 19:38:11 -07:00
Patrick Devine
5bea29f610
add new list command ( #97 )
2023-07-18 09:09:45 -07:00
Patrick Devine
2fb52261ad
basic distribution w/ push/pull ( #78 )
...
* basic distribution w/ push/pull
* add the parser
* add create, pull, and push
* changes to the parser, FROM line, and fix commands
* mkdirp new manifest directories
* make `blobs` directory if it does not exist
* fix go warnings
* add progressbar for model pulls
* move model struct
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-07-16 17:02:22 -07:00
Michael Yang
965f9ad033
Merge pull request #77 from jmorganca/mem
...
continue conversation
2023-07-14 14:57:42 -07:00
Michael Yang
5fefaa5d4d
fix typo
2023-07-14 10:47:18 -07:00
Michael Yang
1775647f76
continue conversation
...
feed responses back into the llm
2023-07-13 17:13:00 -07:00
Michael Yang
05e08d2310
return more info in generate response
2023-07-13 09:37:32 -07:00