Commit graph

107 commits

Author SHA1 Message Date
Michael Yang
fccf8d179f partial decode ggml bin for more info 2023-08-10 09:23:10 -07:00
Bruce MacDonald
984c9c628c fix embeddings invalid values 2023-08-09 16:50:53 -04:00
Bruce MacDonald
09d8bf6730 fix build errors 2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile 2023-08-09 10:26:19 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83 embed text document in modelfile 2023-08-08 11:27:17 -04:00
Jeffrey Morgan
5eb712f962 trim whitespace before checking stop conditions
Fixes #295
2023-08-08 00:29:19 -04:00
Michael Yang
4dc5b117dd automatically set num_keep if num_keep < 0
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Michael Yang
b9f4d67554 configurable rope frequency parameters 2023-08-03 22:11:58 -07:00
Michael Yang
c5bcf32823 update llama.cpp 2023-08-03 11:50:24 -07:00
Michael Yang
0e79e52ddd override ggml-metal if the file is different 2023-08-02 12:50:30 -07:00
Michael Yang
74a5f7e698 no gpu for 70B model 2023-08-01 17:12:50 -07:00
Michael Yang
7a1c3e62dc update llama.cpp 2023-08-01 16:54:01 -07:00
Michael Yang
319f078dd9 remove -Werror
there are compile warnings on Linux which -Werror elevates to errors,
preventing compile
2023-07-31 21:45:56 -07:00
Jeffrey Morgan
7da249fcc1 only build metal for darwin,arm target 2023-07-31 21:35:23 -04:00
Bruce MacDonald
184ad8f057 allow specifying stop conditions in modelfile 2023-07-28 11:02:04 -04:00
Jeffrey Morgan
dffc8b6e09 update llama.cpp to d91f3f0 2023-07-28 08:07:48 -04:00
Michael Yang
3549676678 embed ggml-metal.metal 2023-07-27 17:23:29 -07:00
Michael Yang
fadf75f99d add stop conditions 2023-07-27 17:00:47 -07:00
Michael Yang
ad3a7d0e2c add NumGQA 2023-07-27 14:05:11 -07:00
Michael Yang
18ffeeec45 update llama.cpp 2023-07-27 14:05:11 -07:00
Michael Yang
cca61181cb sample metrics 2023-07-27 09:31:44 -07:00
Michael Yang
c490416189 lock on llm.lock(); decrease batch size 2023-07-27 09:31:44 -07:00
Michael Yang
f62a882760 add session expiration 2023-07-27 09:31:44 -07:00
Michael Yang
3003fc03fc update predict code 2023-07-27 09:31:44 -07:00
Michael Yang
35af37a2cb session id 2023-07-27 09:31:44 -07:00
Michael Yang
726bc647b2 enable k quants 2023-07-25 08:39:58 -07:00
Michael Yang
cb55fa9270 enable accelerate 2023-07-24 17:14:45 -07:00
Michael Yang
b71c67b6ba allocate a large enough tokens slice 2023-07-21 23:05:15 -07:00
Michael Yang
8526e1f5f1 add llama.cpp mpi, opencl files 2023-07-20 14:19:55 -07:00
Michael Yang
a83eaa7a9f update llama.cpp to e782c9e735f93ab4767ffc37462c523b73a17ddc 2023-07-20 11:55:56 -07:00
Michael Yang
5156e48c2a add script to update llama.cpp 2023-07-20 11:54:59 -07:00
Michael Yang
40c9dc0a31 fix multibyte responses 2023-07-14 20:11:44 -07:00
Michael Yang
0142660bd4 size_t 2023-07-14 17:29:16 -07:00
Michael Yang
1775647f76 continue conversation
feed responses back into the llm
2023-07-13 17:13:00 -07:00
Michael Yang
05e08d2310 return more info in generate response 2023-07-13 09:37:32 -07:00
Michael Yang
e1f0a0dc74 fix eof error in generate 2023-07-12 09:36:16 -07:00
Jeffrey Morgan
c63f811909 return error if model fails to load 2023-07-11 20:32:26 -07:00
Jeffrey Morgan
7c71c10d4f fix compilation issue in Dockerfile, remove from README.md until ready 2023-07-11 19:51:08 -07:00
Jeffrey Morgan
e64ef69e34 look for ggml-metal in the same directory as the binary 2023-07-11 15:58:56 -07:00
Michael Yang
442dec1c6f vendor llama.cpp 2023-07-11 11:59:18 -07:00
Michael Yang
fd4792ec56 call llama.cpp directly from go 2023-07-11 11:59:18 -07:00
Jeffrey Morgan
268e362fa7 fix binding build 2023-07-10 11:33:43 -07:00
Jeffrey Morgan
a18e6b3a40 llama: remove unnecessary std::vector 2023-07-09 10:51:45 -04:00
Jeffrey Morgan
5fb96255dc llama: remove unused helper functions 2023-07-09 10:25:07 -04:00
Patrick Devine
3f1b7177f2 pass model and predict options 2023-07-07 09:34:05 -07:00
Michael Yang
5dc9c8ff23 more free 2023-07-06 17:08:03 -07:00
Bruce MacDonald
da74384a3e remove prompt cache 2023-07-06 17:49:05 -04:00
Michael Yang
2c80eddd71 more free 2023-07-06 16:34:44 -04:00
Jeffrey Morgan
9fe018675f use Makefile for dependency building instead of go generate 2023-07-06 16:34:44 -04:00
Michael Yang
1b7183c5a1 enable metal gpu acceleration
ggml-metal.metal must be in the same directory as the ollama binary
otherwise llama.cpp will not be able to find it and load it.

1. go generate llama/llama_metal.go
2. go build .
3. ./ollama serve
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
0998d4f0a4 remove debug print statements 2023-07-06 16:34:44 -04:00
Jeffrey Morgan
79a999e95d fix crash in bindings 2023-07-06 16:34:44 -04:00
Jeffrey Morgan
fd962a36e5 client updates 2023-07-06 16:34:44 -04:00
Jeffrey Morgan
0240165388 fix llama.cpp build 2023-07-06 16:34:44 -04:00
Jeffrey Morgan
9164981d72 move prompt templates out of python bindings 2023-07-06 16:34:44 -04:00
Jeffrey Morgan
6093a88c1a add llama.cpp go bindings 2023-07-06 16:34:44 -04:00