Bruce MacDonald
fe6f3b48f7
do not reload the running llm when runtime params change ( #840 )
...
- only reload the running llm if the model has changed, or the options for loading the running model have changed
- rename loaded llm to runner to differentiate from loaded model image
- remove logic which keeps the first system prompt in the generation context
2023-10-19 10:39:58 -04:00
Yiorgis Gozadinos
8c6c2cbc8c
When the .ollama folder is broken or there are no models return an empty list on /api/tags
2023-10-18 08:23:20 +02:00
Michael Yang
1af493c5a0
server: print version on start
2023-10-16 09:59:14 -07:00
Bruce MacDonald
a0c3e989de
deprecate modelfile embed command ( #759 )
2023-10-16 11:07:37 -04:00
Bruce MacDonald
7804b8fab9
validate api options fields from map ( #711 )
2023-10-12 11:18:11 -04:00
Bruce MacDonald
274d5a5fdf
optional parameter to not stream response ( #639 )
...
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Bruce MacDonald
af4cf55884
not found error before pulling model ( #718 )
2023-10-06 16:06:20 -04:00
Jay Nakrani
1d0ebe67e8
Document response stream chunk delimiter. ( #632 )
...
Document response stream chunk delimiter.
2023-09-29 21:45:52 -07:00
Bruce MacDonald
a1b2d95f96
remove unused push/pull params ( #650 )
2023-09-29 17:27:19 -04:00
Michael Yang
8608eb4760
prune empty directories
2023-09-27 10:58:09 -07:00
Jeffrey Morgan
9b12a511ca
check other request fields before load short circuit in /api/generate
2023-09-22 23:50:55 -04:00
Bruce MacDonald
5d71bda478
close llm on interrupt ( #577 )
2023-09-22 19:41:52 +01:00
Michael Yang
82f5b66c01
register HEAD /api/tags
2023-09-21 16:38:03 -07:00
Michael Yang
c986694367
fix HEAD / request
...
HEAD request should respond like their GET counterparts except without a
response body.
2023-09-21 16:35:58 -07:00
Bruce MacDonald
4cba75efc5
remove tmp directories created by previous servers ( #559 )
...
* remove tmp directories created by previous servers
* clean up on server stop
* Update routes.go
* Update server/routes.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* create top-level temp ollama dir
* check file exists before creating
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-21 20:38:49 +01:00
Michael Yang
1fabba474b
refactor default allow origins
...
this should be less error prone
2023-09-21 09:42:25 -07:00
Bruce MacDonald
1255bc9b45
only package 11.8 runner
2023-09-20 20:00:41 +01:00
Patrick Devine
80dd44e80a
Cmd changes ( #541 )
2023-09-18 12:26:56 -07:00
Bruce MacDonald
f221637053
first pass at linux gpu support ( #454 )
...
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488 )
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-12 11:04:35 -04:00
Patrick Devine
e7e91cd71c
add autoprune to remove unused layers ( #491 )
2023-09-11 11:46:35 -07:00
Patrick Devine
790d24eb7b
add show command ( #474 )
2023-09-06 11:04:17 -07:00
Michael Yang
681f3c4c42
fix num_keep
2023-09-03 17:47:49 -04:00
Michael Yang
eeb40a672c
fix list models for windows
2023-08-31 09:47:10 -04:00
Michael Yang
0f541a0367
s/ListResponseModel/ModelResponse/
2023-08-31 09:47:10 -04:00
Bruce MacDonald
42998d797d
subprocess llama.cpp server ( #401 )
...
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Patrick Devine
8bbff2df98
add model IDs ( #439 )
2023-08-28 20:50:24 -07:00
Michael Yang
95187d7e1e
build release mode
2023-08-22 09:52:43 -07:00
Jeffrey Morgan
a9f6c56652
fix FROM
instruction erroring when referring to a file
2023-08-22 09:39:42 -07:00
Ryan Baker
0a892419ad
Strip protocol from model path ( #377 )
2023-08-21 21:56:56 -07:00
Bruce MacDonald
326de48930
use loaded llm for embeddings
2023-08-15 10:50:54 -03:00
Patrick Devine
d9cf18e28d
add maximum retries when pushing ( #334 )
2023-08-11 15:41:55 -07:00
Michael Yang
6517bcc53c
Merge pull request #290 from jmorganca/add-adapter-layers
...
implement loading ggml lora adapters through the modelfile
2023-08-10 17:23:01 -07:00
Michael Yang
6a6828bddf
Merge pull request #167 from jmorganca/decode-ggml
...
partial decode ggml bin for more info
2023-08-10 17:22:40 -07:00
Jeffrey Morgan
040a5b9750
clean up cli flags
2023-08-10 09:27:03 -07:00
Michael Yang
6de5d032e1
implement loading ggml lora adapters through the modelfile
2023-08-10 09:23:39 -07:00
Michael Yang
fccf8d179f
partial decode ggml bin for more info
2023-08-10 09:23:10 -07:00
Bruce MacDonald
4b3507f036
embeddings endpoint
...
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-10 11:45:57 -04:00
Bruce MacDonald
868e3b31c7
allow for concurrent pulls of the same files
2023-08-09 11:31:54 -04:00
Bruce MacDonald
09d8bf6730
fix build errors
2023-08-09 10:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile
2023-08-09 10:26:19 -04:00
Jeffrey Morgan
cff002b824
use content type application/x-ndjson
for streaming responses
2023-08-08 21:38:10 -07:00
Jeffrey Morgan
a027a7dd65
add 0.0.0.0
as an allowed origin by default
...
Fixes #282
2023-08-08 13:39:50 -07:00
Bruce MacDonald
21ddcaa1f1
pr comments
...
- default to embeddings enabled
- move embedding logic for loaded model to request
- allow embedding full directory
- close llm on reload
2023-08-08 13:49:37 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
...
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Bruce MacDonald
a6f6d18f83
embed text document in modelfile
2023-08-08 11:27:17 -04:00
Michael Yang
4dc5b117dd
automatically set num_keep if num_keep < 0
...
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
cmiller01
fb593b7bfc
pass flags to serve
to allow setting allowed-origins + host and port
...
* resolves: https://github.com/jmorganca/ollama/issues/300 and
https://github.com/jmorganca/ollama/issues/282
* example usage:
```
ollama serve --port 9999 --allowed-origins "http://foo.example.com,http://192.0.0.1 "
```
2023-08-07 03:34:37 +00:00
Jeffrey Morgan
e3fb1fd3f1
server: compare options correctly
2023-08-03 15:55:40 -04:00
Bruce MacDonald
8b1e791820
allow specifying zero values in modelfile
2023-08-02 17:07:53 -04:00
Jeffrey Morgan
03cff3a225
server: reset digest at end of generate
2023-08-02 16:15:44 -04:00
Bruce MacDonald
8f8b6288ac
check server is running before running command
2023-08-02 10:51:23 -04:00
Bruce MacDonald
765994362c
use head to check heartbeat
2023-08-01 14:50:38 -04:00
Bruce MacDonald
1c5a8770ee
read runner parameter options from map
...
- read runner options from map to see what was specified explicitly and overwrite zero values
2023-08-01 13:38:19 -04:00
Bruce MacDonald
daa0d1de7a
allow specifying zero values in modelfile
2023-08-01 13:37:50 -04:00
Jeffrey Morgan
528bafa585
cache loaded model
2023-08-01 11:24:18 -04:00
Bruce MacDonald
671eec6da9
log prediction failures
2023-07-31 16:46:37 -04:00
Michael Yang
f62a882760
add session expiration
2023-07-27 09:31:44 -07:00
Michael Yang
32aec66e6a
add load duration
2023-07-27 09:31:44 -07:00
Michael Yang
35af37a2cb
session id
2023-07-27 09:31:44 -07:00
Bruce MacDonald
4c1caa3733
download models when creating from modelfile
2023-07-25 14:25:13 -04:00
Patrick Devine
4cb42ca55e
add copy command ( #191 )
2023-07-24 11:27:28 -04:00
Michael Yang
8609db77ea
use gin-contrib/cors middleware
2023-07-22 09:39:08 -07:00
Patrick Devine
6d6b0d3321
change error handler behavior and fix error when a model isn't found ( #173 )
2023-07-21 23:02:12 -07:00
Patrick Devine
9f6e97865c
allow pushing/pulling to insecure registries ( #157 )
2023-07-21 15:42:19 -07:00
Bruce MacDonald
7ba1308595
Merge pull request #147 from jmorganca/brucemacd/cli-err-display
...
Improve CLI error display
2023-07-21 16:10:19 +02:00
Patrick Devine
e7a393de54
add rm command for models ( #151 )
2023-07-20 16:09:23 -07:00
Michael Yang
1f27d7f1b8
fix stream errors
2023-07-20 12:12:08 -07:00
Bruce MacDonald
09dc6273e3
suppress error when running list before pulling image
2023-07-20 20:53:09 +02:00
Bruce MacDonald
3ec4ebc562
remove unused code
2023-07-20 20:18:00 +02:00
Michael Yang
df146c41e2
separate prompt into template and system
2023-07-19 23:24:31 -07:00
Jeffrey Morgan
2d305fa99a
allow relative paths in FROM
instruction
2023-07-19 21:55:15 -07:00
Michael Yang
68df36ae50
fix pull 0 bytes on completed layer
2023-07-18 19:38:11 -07:00
Patrick Devine
9e15635c2d
attempt two for skipping files in the file walk ( #105 )
2023-07-18 15:37:01 -07:00
Patrick Devine
9658a5043b
skip files in the list if we can't get the correct model path ( #100 )
2023-07-18 12:39:08 -07:00
Patrick Devine
5bea29f610
add new list command ( #97 )
2023-07-18 09:09:45 -07:00
Michael Yang
c7dd52271c
remove debugging messages
2023-07-17 14:17:34 -07:00
Michael Yang
28a136e9a3
modelfile params
2023-07-17 12:35:03 -07:00
Patrick Devine
2fb52261ad
basic distribution w/ push/pull ( #78 )
...
* basic distribution w/ push/pull
* add the parser
* add create, pull, and push
* changes to the parser, FROM line, and fix commands
* mkdirp new manifest directories
* make `blobs` directory if it does not exist
* fix go warnings
* add progressbar for model pulls
* move model struct
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-07-16 17:02:22 -07:00
Michael Yang
743e957d88
use filepath for os compat
2023-07-14 17:27:14 -07:00
Michael Yang
5ade3db040
fix race
...
block on write which only returns when the channel is closed. this is
contrary to the previous arrangement where the handler may return but
the stream hasn't finished writing. it can lead to the client receiving
unexpected responses (since the request has been handled) or worst case
a nil-pointer dereference as the stream tries to flush a nil writer
2023-07-14 15:10:46 -07:00
Michael Yang
1775647f76
continue conversation
...
feed responses back into the llm
2023-07-13 17:13:00 -07:00
Michael Yang
05e08d2310
return more info in generate response
2023-07-13 09:37:32 -07:00
Michael Yang
31590284a7
fix route
2023-07-12 19:21:49 -07:00
Michael Yang
2666d3c206
fix pull race
2023-07-12 19:07:23 -07:00
Michael Yang
0944b01e7d
pull fixes
2023-07-12 09:55:07 -07:00
Michael Yang
a806b03f62
no errgroup
2023-07-11 14:58:10 -07:00
Michael Yang
e243329e2e
check api status
2023-07-11 13:42:05 -07:00
Michael Yang
2a66a1164a
common stream producer
2023-07-11 13:42:05 -07:00
Michael Yang
fd4792ec56
call llama.cpp directly from go
2023-07-11 11:59:18 -07:00
Jeffrey Morgan
a3ec1ec2a0
consistent error handling for pull and generate
2023-07-10 21:34:15 -07:00
Michael Yang
edba935d67
return error in generate response
2023-07-10 13:30:10 -07:00
Bruce MacDonald
f5e2e150b8
allow overriding default generate options
2023-07-10 20:58:02 +02:00
Jeffrey Morgan
74e92d1258
add basic /
route for server
2023-07-07 23:46:15 -04:00
Bruce MacDonald
f533f85d44
pr feedback
...
- move error check to api client pull
- simplify error check in generate
- return nil on any pull error
2023-07-07 17:12:02 -04:00
Bruce MacDonald
61dd87bd90
if directory cannot be resolved, do not fail
2023-07-07 15:27:43 -04:00
Patrick Devine
3f1b7177f2
pass model and predict options
2023-07-07 09:34:05 -07:00
Michael Yang
b0618a466e
generate progress
2023-07-06 17:07:40 -07:00
Michael Yang
15c114decb
fix prompt templates
2023-07-06 17:03:18 -07:00
Michael Yang
0637632258
simple pull response
2023-07-06 16:34:44 -04:00
Michael Yang
dd960d1d5e
update generate response
2023-07-06 16:34:44 -04:00
Michael Yang
9b8a456c7d
embed templates
2023-07-06 16:34:44 -04:00
Bruce MacDonald
7cf5905063
display pull progress
2023-07-06 16:34:44 -04:00
Michael Yang
580fe8951c
free llama model
2023-07-06 16:34:44 -04:00
Michael Yang
68e6b4550c
use prompt templates
2023-07-06 16:34:44 -04:00
Bruce MacDonald
a6494f8211
pull models
2023-07-06 16:34:44 -04:00
Michael Yang
1b7183c5a1
enable metal gpu acceleration
...
ggml-metal.metal must be in the same directory as the ollama binary
otherwise llama.cpp will not be able to find it and load it.
1. go generate llama/llama_metal.go
2. go build .
3. ./ollama serve
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
0998d4f0a4
remove debug print statements
2023-07-06 16:34:44 -04:00
Bruce MacDonald
8ea5e5e147
separate routes
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
fd962a36e5
client updates
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
9164981d72
move prompt templates out of python bindings
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
6093a88c1a
add llama.cpp go bindings
2023-07-06 16:34:44 -04:00
Jeffrey Morgan
76cb60d496
wip go engine
...
Co-authored-by: Patrick Devine <pdevine@sonic.net>
2023-07-06 16:34:44 -04:00