Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
Daniel Hiltgen
02b31c9dc8
Don't return error on signal exit
2024-05-16 16:25:38 -07:00
Patrick Devine
d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models ( #4461 )
2024-05-15 15:43:16 -07:00
Patrick Devine
f2cf97d6f1
fix typo in modelfile generation ( #4439 )
2024-05-14 15:34:29 -07:00
Ryo Machida
798b107f19
Fixed the API endpoint /api/tags when the model list is empty. ( #4424 )
...
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.
* Update server/routes.go
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Patrick Devine
7ca71a6b0f
don't abort when an invalid model name is used in /save ( #4416 )
2024-05-13 18:48:28 -07:00
Patrick Devine
6845988807
Ollama ps
command for showing currently loaded models ( #4327 )
2024-05-13 17:17:36 -07:00
Jeffrey Morgan
6602e793c0
Use --quantize
flag and quantize
api parameter ( #4321 )
...
* rename `--quantization` to `--quantize`
* backwards
* Update api/types.go
Co-authored-by: Michael Yang <mxyng@pm.me>
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Michael Yang
e03637176d
fix(routes): skip bad manifests
2024-05-10 08:46:11 -07:00
Daniel Hiltgen
3ae2f441e0
Fix race in shutdown logic
...
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Daniel Hiltgen
8727a9c140
Record more GPU information
...
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Bruce MacDonald
cfa84b8470
add done_reason to the api ( #4235 )
2024-05-09 13:30:14 -07:00
Michael Yang
a7ee84fc31
routes: skip invalid filepaths
2024-05-09 11:23:22 -07:00
Jeffrey Morgan
d5eec16d23
use model defaults for num_gqa
, rope_frequency_base
and rope_frequency_scale
( #1983 )
2024-05-09 09:06:13 -07:00
Bruce MacDonald
cef45feaa4
Add preflight OPTIONS handling and update CORS config ( #4086 )
...
* Add preflight OPTIONS handling and update CORS config
- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.
- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.
* allow auth, content-type, and user-agent headers
* Update routes.go
2024-05-08 13:14:00 -07:00
Bruce MacDonald
8cbd3e7510
skip hidden files in list models handler ( #4247 )
2024-05-07 19:01:45 -07:00
Bruce MacDonald
dc9b1111e0
fix invalid destination error message
2024-05-07 17:35:52 -07:00
Michael Yang
ffbd3d173f
Merge pull request #3715 from ollama/mxyng/modelname-2
...
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang
1e0a669f75
Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
...
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Michael Yang
548a7df014
update list handler to use model.Name
2024-05-07 09:38:45 -07:00
Jeffrey Morgan
39d9d22ca3
close server on receiving signal ( #4213 )
2024-05-06 16:01:37 -07:00
Michael Yang
9685c34509
quantize any fp16/fp32 model
...
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Daniel Hiltgen
f56aa20014
Centralize server config handling
...
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
Daniel Hiltgen
20f6c06569
Make maximum pending request configurable
...
This also bumps up the default to be 50 queued requests
instead of 10.
2024-05-04 21:00:52 -07:00
Michael Yang
b7a87a22b6
Merge pull request #4059 from ollama/mxyng/parser-2
...
rename parser to model/file
2024-05-03 13:01:22 -07:00
Michael Yang
e9ae607ece
Merge pull request #3892 from ollama/mxyng/parser
...
refactor modelfile parser
2024-05-02 17:04:47 -07:00
Michael Yang
45b6a12e45
server: target invalid
2024-05-01 12:40:45 -07:00
Michael Yang
119589fcb3
rename parser to model/file
2024-05-01 09:53:50 -07:00
Michael Yang
9cf0f2e973
use parser.Format instead of templating modelfile
2024-05-01 09:52:54 -07:00
Jeffrey Morgan
bb31def011
return code 499
when user cancels request while a model is loading ( #3955 )
2024-04-26 17:38:29 -04:00
Michael Yang
592dae31c8
update copy to use model.Name
2024-04-24 15:54:54 -07:00
Daniel Hiltgen
34b9db5afc
Request and model concurrency
...
This change adds support for multiple concurrent requests, as well as
loading multiple models by spawning multiple runners. The default
settings are currently set at 1 concurrent request per model and only 1
loaded model at a time, but these can be adjusted by setting
OLLAMA_NUM_PARALLEL and OLLAMA_MAX_LOADED_MODELS.
2024-04-22 19:29:12 -07:00
Jeffrey Morgan
a0b8a32eb4
Terminate subprocess if receiving SIGINT
or SIGTERM
signals while model is loading ( #3653 )
...
* terminate subprocess if receiving `SIGINT` or `SIGTERM` signals while model is loading
* use `unload` in signal handler
2024-04-15 12:09:32 -04:00
Michael Yang
9502e5661f
cgo quantize
2024-04-08 15:31:08 -07:00
Michael Yang
e1c9a2a00f
no blob create if already exists
2024-04-08 15:09:48 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
af8a8a6b59
fix: trim quotes on OLLAMA_ORIGINS
2024-03-27 15:24:29 -07:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama
to github.com/ollama/ollama
( #3347 )
2024-03-26 13:04:17 -07:00
Blake Mizerany
703684a82a
server: replace blob prefix separator from ':' to '-' ( #3146 )
...
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Patrick Devine
47cfe58af5
Default Keep Alive environment variable ( #3094 )
...
---------
Co-authored-by: Chris-AS1 <8493773+Chris-AS1@users.noreply.github.com>
2024-03-13 13:29:40 -07:00
Daniel Hiltgen
4a5c9b8035
Finish unwinding idempotent payload logic
...
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent. This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan
5b3fad9636
separate out isLocalIP
2024-03-09 00:22:08 -08:00
Jeffrey Morgan
bfec2c6e10
simplify host checks
2024-03-08 23:29:53 -08:00
Jeffrey Morgan
5c143af726
add additional allowed hosts
2024-03-08 23:23:59 -08:00
Jeffrey Morgan
fc8c044584
add allowed host middleware and remove workDir
middleware ( #3018 )
2024-03-08 22:23:47 -08:00