Michael Yang
01114b4526
fix: rope
2024-04-09 16:15:24 -07:00
Michael Yang
9502e5661f
cgo quantize
2024-04-08 15:31:08 -07:00
Michael Yang
e1c9a2a00f
no blob create if already exists
2024-04-08 15:09:48 -07:00
Michael Yang
be517e491c
no rope parameters
2024-04-05 18:05:27 -07:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama
to github.com/ollama/ollama
( #3347 )
2024-03-26 13:04:17 -07:00
Patrick Devine
47cfe58af5
Default Keep Alive environment variable ( #3094 )
...
---------
Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
2024-03-13 13:29:40 -07:00
Jeffrey Morgan
3b4bab3dc5
Fix embeddings load model behavior ( #2848 )
2024-02-29 17:40:56 -08:00
Ikko Eltociear Ashimine
e95b896790
Update types.go ( #2744 )
...
specfied -> specified
2024-02-25 13:41:25 -05:00
Michael Yang
897b213468
use http.DefaultClient ( #2530 )
...
default client already handles proxy
2024-02-20 18:34:47 -05:00
bnorick
caf2b13c10
Fix infinite keep_alive ( #2480 )
2024-02-13 15:40:32 -08:00
Patrick Devine
b5cf31b460
add keep_alive to generate/chat/embedding api endpoints ( #2146 )
2024-01-26 14:28:02 -08:00
Patrick Devine
7c40a67841
Save and load sessions ( #2063 )
2024-01-25 12:12:36 -08:00
Michael Yang
745b5934fa
add model to ModelResponse
2024-01-18 14:32:55 -08:00
Michael Yang
a38d88d828
api: add model for all requests
...
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Michael Yang
5ffbbea1d7
remove client.py
2024-01-11 15:53:10 -08:00
Patrick Devine
22e93efa41
add show info command and fix the modelfile
2024-01-05 12:20:05 -08:00
Brian Murray
0d6e3565ae
Add embeddings to API ( #1773 )
2024-01-04 15:00:52 -05:00
Jeffrey Morgan
55978c1dc9
clean up cache api option
2023-12-27 14:27:45 -05:00
Jeffrey Morgan
d4ebdadbe7
enable cache_prompt
by default
2023-12-27 14:23:42 -05:00
K0IN
10da41d677
Add Cache flag to api ( #1642 )
2023-12-22 17:16:20 -05:00
Bruce MacDonald
d99fa6ce0a
send empty messages on last chat response ( #1530 )
2023-12-18 14:23:38 -05:00
Patrick Devine
d9e60f634b
add image support to the chat api ( #1490 )
2023-12-12 13:28:58 -08:00
Patrick Devine
910e9401d0
Multimodal support ( #1216 )
...
---------
Co-authored-by: Matt Apperson <mattapperson@Matts-MacBook-Pro.local>
2023-12-11 13:56:22 -08:00
Jeffrey Morgan
9e1406e4ed
Don't expose model information in /api/generate
2023-12-09 02:05:43 -08:00
Michael Yang
c3ff36088b
Merge pull request #774 from jmorganca/mxyng/server-version
...
add version api and show server version in cli
2023-12-06 13:22:55 -08:00
Michael Yang
5d75505ebd
return model configuration in generate
2023-12-05 14:39:02 -08:00
Bruce MacDonald
195e3d9dbd
chat api endpoint ( #1392 )
2023-12-05 14:57:33 -05:00
Michael Yang
0db4706ec2
api: add version api handler
2023-12-05 09:36:01 -08:00
Jeffrey Morgan
00d06619a1
Revert "chat api ( #991 )" while context variable is fixed
...
This reverts commit 7a0899d62d
.
2023-12-04 21:16:27 -08:00
Bruce MacDonald
7a0899d62d
chat api ( #991 )
...
- update chat docs
- add messages chat endpoint
- remove deprecated context and template generate parameters from docs
- context and template are still supported for the time being and will continue to work as expected
- add partial response to chat history
2023-12-04 18:01:06 -05:00
Patrick Devine
cde31cb220
Allow setting parameters in the REPL ( #1294 )
2023-11-29 09:56:42 -08:00
Bruce MacDonald
928950fcc6
update python client create example ( #1227 )
...
* add remote create to python example client
2023-11-27 15:36:19 -05:00
Michael Yang
bc22d5a38b
no blob response
2023-11-15 15:16:23 -08:00
Michael Yang
1901044b07
use checksum reference
2023-11-15 15:16:23 -08:00
Michael Yang
1552cee59f
client create modelfile
2023-11-15 15:16:23 -08:00
Michael Yang
3ca56b5ada
add create modelfile field
2023-11-15 15:16:23 -08:00
Jeffrey Morgan
cdddd3df65
add format
to example python client
2023-11-10 10:22:21 -08:00
Jeffrey Morgan
5cba29b9d6
JSON mode: add `"format" as an api parameter ( #1051 )
...
* add `"format": "json"` as an API parameter
---------
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2023-11-09 16:44:02 -08:00
Bruce MacDonald
a49d6acc1e
add a complete /generate options example ( #1035 )
2023-11-08 16:44:36 -08:00
Bruce MacDonald
ec2a31e9b3
support raw generation requests ( #952 )
...
- add the optional `raw` generate request parameter to bypass prompt formatting and response context
-add raw request to docs
2023-11-08 14:05:02 -08:00
Jeffrey Morgan
17678b7225
Restore system prompt on requests and default num_keep
to 0
2023-11-03 13:25:25 -07:00
Jeffrey Morgan
06589a3b30
Set NumKeep
to 4
by default ( #982 )
2023-11-02 17:26:11 -07:00
Michael Yang
1fd511e661
Merge pull request #975 from jmorganca/mxyng/downloads
...
update downloads to use retry wrapper
2023-11-02 16:12:48 -07:00
Michael Yang
6db3691b8f
update default NumKeep
2023-11-02 15:47:35 -07:00
Michael Yang
60bb3c03a1
use http.Method
2023-11-02 13:12:45 -07:00
Bruce MacDonald
5c3491f425
allow for a configurable ollama model storage directory ( #897 )
...
* allow for a configurable ollama models directory
- set OLLAMA_MODELS in the environment that ollama is running in to change where model files are stored
- update docs
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
Co-Authored-By: Jay Nakrani <dhananjaynakrani@gmail.com>
Co-Authored-By: Akhil Acharya <akhilcacharya@gmail.com>
Co-Authored-By: Sasha Devol <sasha.devol@protonmail.com>
2023-10-27 10:19:59 -04:00
Michael Yang
28c3f288e2
client: fix trailing slash
2023-10-26 11:09:38 -07:00
Michael Yang
459f4a7889
fix: ollama host for hostname
2023-10-20 11:32:41 -07:00
Bruce MacDonald
fe6f3b48f7
do not reload the running llm when runtime params change ( #840 )
...
- only reload the running llm if the model has changed, or the options for loading the running model have changed
- rename loaded llm to runner to differentiate from loaded model image
- remove logic which keeps the first system prompt in the generation context
2023-10-19 10:39:58 -04:00
Michael Yang
92189a5855
fix memory check
2023-10-13 14:47:29 -07:00
Bruce MacDonald
6fe178134d
improve api error handling ( #781 )
...
- remove new lines from llama.cpp error messages relayed to client
- check api option types and return error on wrong type
- change num layers from 95% VRAM to 92% VRAM
2023-10-13 16:57:10 -04:00
Bruce MacDonald
7804b8fab9
validate api options fields from map ( #711 )
2023-10-12 11:18:11 -04:00
Michael Yang
b599946b74
add format bytes
2023-10-11 14:08:23 -07:00
Bruce MacDonald
274d5a5fdf
optional parameter to not stream response ( #639 )
...
* update streaming request accept header
* add optional stream param to request bodies
2023-10-11 12:54:27 -04:00
Michael Yang
2cfffea02e
handle client proxy
2023-10-09 12:33:47 -07:00
Bruce MacDonald
2130c0708b
output type parsed from modelfile ( #678 )
2023-10-05 14:58:04 -04:00
Bruce MacDonald
9e2de1bd2c
increase streaming buffer size ( #692 )
2023-10-04 14:09:00 -04:00
Bruce MacDonald
1fbf3585d6
Relay default values to llama runner ( #672 )
...
* include seed in params for llama.cpp server and remove empty filter for temp
* relay default predict options to llama.cpp
- reorganize options to match predict request for readability
* omit empty stop
---------
Co-authored-by: hallh <hallh@users.noreply.github.com>
2023-10-02 14:53:16 -04:00
Bruce MacDonald
a1b2d95f96
remove unused push/pull params ( #650 )
2023-09-29 17:27:19 -04:00
Michael Yang
f40b3de758
use int64 consistently
2023-09-28 11:07:24 -07:00
Patrick Devine
8efbc5df55
DRAFT: add a simple python client to access ollama ( #522 )
2023-09-14 16:37:38 -07:00
Bruce MacDonald
f221637053
first pass at linux gpu support ( #454 )
...
* linux gpu support
* handle multiple gpus
* add cuda docker image (#488 )
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
2023-09-12 11:04:35 -04:00
Patrick Devine
790d24eb7b
add show command ( #474 )
2023-09-06 11:04:17 -07:00
Michael Yang
0f541a0367
s/ListResponseModel/ModelResponse/
2023-08-31 09:47:10 -04:00
Bruce MacDonald
42998d797d
subprocess llama.cpp server ( #401 )
...
* remove c code
* pack llama.cpp
* use request context for llama_cpp
* let llama_cpp decide the number of threads to use
* stop llama runner when app stops
* remove sample count and duration metrics
* use go generate to get libraries
* tmp dir for running llm
2023-08-30 16:35:03 -04:00
Michael Yang
982c535428
Merge pull request #428 from jmorganca/mxyng/upload-chunks
...
update upload chunks
2023-08-30 07:47:17 -07:00
Patrick Devine
8bbff2df98
add model IDs ( #439 )
2023-08-28 20:50:24 -07:00
Michael Yang
246dc65417
loosen http status code checks
2023-08-28 18:34:53 -04:00
Jeffrey Morgan
22ab7f5f88
default host to 127.0.0.1
, fixes #424
2023-08-26 11:59:28 -07:00
Michael Yang
2c7f956b38
add version
2023-08-22 09:40:58 -07:00
Michael Yang
f723bf0879
ignore nil map values
2023-08-17 15:50:46 -07:00
Jeffrey Morgan
54bb49a502
parse protocol for OLLAMA_HOST
2023-08-17 18:20:44 -04:00
Jeffrey Morgan
5ee6116420
set default OLLAMA_HOST
to
http://localhost:11434
2023-08-16 12:22:59 -04:00
Blake Mizerany
67e593e355
cmd: support OLLAMA_CLIENT_HOST environment variable ( #262 )
...
* cmd: support OLLAMA_HOST environment variable
This commit adds support for the OLLAMA_HOST environment
variable. This variable can be used to specify the host to which
the client should connect. This is useful when the client is
running somewhere other than the host where the server is running.
The new api.FromEnv function is used to read configure clients from the
environment. Clients wishing to use the environment variable being
consistent with the Ollama CLI can use this new function.
* Update api/client.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update api/client.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-16 11:03:48 -04:00
Michael Yang
f27bc261cf
s/parmeter/parameter/
2023-08-10 16:26:06 -07:00
Michael Yang
81d8d7b73f
fix could not convert int
2023-08-10 16:24:17 -07:00
Patrick Devine
be989d89d1
Token auth ( #314 )
2023-08-10 11:34:25 -07:00
Bruce MacDonald
4b3507f036
embeddings endpoint
...
Co-Authored-By: Jeffrey Morgan <jmorganca@gmail.com>
2023-08-10 11:45:57 -04:00
Bruce MacDonald
7a5f3616fd
embed text document in modelfile
2023-08-09 10:26:19 -04:00
Bruce MacDonald
21ddcaa1f1
pr comments
...
- default to embeddings enabled
- move embedding logic for loaded model to request
- allow embedding full directory
- close llm on reload
2023-08-08 13:49:37 -04:00
Michael Yang
f2074ed4c0
Merge pull request #306 from jmorganca/default-keep-system
...
automatically set num_keep if num_keep < 0
2023-08-08 09:25:34 -07:00
Jeffrey Morgan
8713ac23a8
allow overriding template
and system
in /api/generate
...
Fixes #297
Fixes #296
2023-08-08 00:55:34 -04:00
Michael Yang
4dc5b117dd
automatically set num_keep if num_keep < 0
...
num_keep defines how many tokens to keep in the context when truncating
inputs. if left to its default value of -1, the server will calculate
num_keep to be the left of the system instructions
2023-08-07 16:19:12 -07:00
Michael Yang
b9f4d67554
configurable rope frequency parameters
2023-08-03 22:11:58 -07:00
Bruce MacDonald
8b1e791820
allow specifying zero values in modelfile
2023-08-02 17:07:53 -04:00
Bruce MacDonald
8f8b6288ac
check server is running before running command
2023-08-02 10:51:23 -04:00
Bruce MacDonald
765994362c
use head to check heartbeat
2023-08-01 14:50:38 -04:00
Bruce MacDonald
1c5a8770ee
read runner parameter options from map
...
- read runner options from map to see what was specified explicitly and overwrite zero values
2023-08-01 13:38:19 -04:00
Jeffrey Morgan
528bafa585
cache loaded model
2023-08-01 11:24:18 -04:00
Bruce MacDonald
e72fe7945f
check server is running before running command
2023-07-31 16:25:57 -04:00
Bruce MacDonald
184ad8f057
allow specifying stop conditions in modelfile
2023-07-28 11:02:04 -04:00
Jeffrey Morgan
822a0e36eb
lower batch size to 512
2023-07-28 10:56:21 -04:00
Michael Yang
fadf75f99d
add stop conditions
2023-07-27 17:00:47 -07:00
Michael Yang
ad3a7d0e2c
add NumGQA
2023-07-27 14:05:11 -07:00
Jeffrey Morgan
688661ab9b
increase default batch size to 1024
2023-07-27 16:51:01 -04:00
Michael Yang
cca61181cb
sample metrics
2023-07-27 09:31:44 -07:00
Michael Yang
c490416189
lock on llm.lock(); decrease batch size
2023-07-27 09:31:44 -07:00
Michael Yang
f62a882760
add session expiration
2023-07-27 09:31:44 -07:00
Michael Yang
3003fc03fc
update predict code
2023-07-27 09:31:44 -07:00
Michael Yang
32aec66e6a
add load duration
2023-07-27 09:31:44 -07:00