Bruce MacDonald
09d8bf6730
fix build errors
2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile
2023-08-09 10:26:19 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
...
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83
embed text document in modelfile
2023-08-08 11:27:17 -04:00
Jeffrey Morgan
5eb712f962
trim whitespace before checking stop conditions
...
Fixes #295
2023-08-08 00:29:19 -04:00
Michael Yang
4dc5b117dd
automatically set num_keep if num_keep < 0
...
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Michael Yang
b9f4d67554
configurable rope frequency parameters
2023-08-03 22:11:58 -07:00
Michael Yang
c5bcf32823
update llama.cpp
2023-08-03 11:50:24 -07:00
Michael Yang
74a5f7e698
no gpu for 70B model
2023-08-01 17:12:50 -07:00
Michael Yang
319f078dd9
remove -Werror
...
there are compile warnings on Linux which -Werror elevates to errors,
preventing compile
2023-07-31 21:45:56 -07:00
Jeffrey Morgan
7da249fcc1
only build metal for darwin,arm
target
2023-07-31 21:35:23 -04:00
Bruce MacDonald
184ad8f057
allow specifying stop conditions in modelfile
2023-07-28 11:02:04 -04:00
Michael Yang
3549676678
embed ggml-metal.metal
2023-07-27 17:23:29 -07:00
Michael Yang
fadf75f99d
add stop conditions
2023-07-27 17:00:47 -07:00
Michael Yang
ad3a7d0e2c
add NumGQA
2023-07-27 14:05:11 -07:00
Michael Yang
cca61181cb
sample metrics
2023-07-27 09:31:44 -07:00
Michael Yang
c490416189
lock on llm.lock(); decrease batch size
2023-07-27 09:31:44 -07:00
Michael Yang
f62a882760
add session expiration
2023-07-27 09:31:44 -07:00
Michael Yang
3003fc03fc
update predict code
2023-07-27 09:31:44 -07:00
Michael Yang
35af37a2cb
session id
2023-07-27 09:31:44 -07:00
Michael Yang
726bc647b2
enable k quants
2023-07-25 08:39:58 -07:00
Michael Yang
cb55fa9270
enable accelerate
2023-07-24 17:14:45 -07:00
Michael Yang
b71c67b6ba
allocate a large enough tokens slice
2023-07-21 23:05:15 -07:00
Michael Yang
40c9dc0a31
fix multibyte responses
2023-07-14 20:11:44 -07:00
Michael Yang
0142660bd4
size_t
2023-07-14 17:29:16 -07:00
Michael Yang
1775647f76
continue conversation
...
feed responses back into the llm
2023-07-13 17:13:00 -07:00
Michael Yang
05e08d2310
return more info in generate response
2023-07-13 09:37:32 -07:00
Michael Yang
e1f0a0dc74
fix eof error in generate
2023-07-12 09:36:16 -07:00
Jeffrey Morgan
c63f811909
return error if model fails to load
2023-07-11 20:32:26 -07:00
Michael Yang
442dec1c6f
vendor llama.cpp
2023-07-11 11:59:18 -07:00
Michael Yang
fd4792ec56
call llama.cpp directly from go
2023-07-11 11:59:18 -07:00
Jeffrey Morgan
5fb96255dc
llama: remove unused helper functions
2023-07-09 10:25:07 -04:00
Patrick Devine
3f1b7177f2
pass model and predict options
2023-07-07 09:34:05 -07:00
Michael Yang
5dc9c8ff23
more free
2023-07-06 17:08:03 -07:00
Bruce MacDonald
da74384a3e
remove prompt cache
2023-07-06 17:49:05 -04:00
Michael Yang
2c80eddd71
more free
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
9fe018675f
use Makefile
for dependency building instead of go generate
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
0998d4f0a4
remove debug print statements
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
79a999e95d
fix crash in bindings
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
fd962a36e5
client updates
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
0240165388
fix llama.cpp build
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
6093a88c1a
add llama.cpp go bindings
2023-07-06 16:34:44 -04:00