Michael Yang
96bc232b43
Merge pull request #4413 from ollama/mxyng/name-check
...
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang
bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
...
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Michael Yang
6adca97f37
Merge pull request #4619 from noxer/patch-1
...
Fix download retry issue
2024-05-24 17:21:57 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Tim Scheuermann
db2ffa79f1
Fix download retry issue
2024-05-24 20:30:42 +02:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now ( #4580 )
...
* put flash attention behind flag for now
* add test
* remove print
* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Sang Park
4434d7f447
Correct typo in error message ( #4535 )
...
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
Daniel Hiltgen
02b31c9dc8
Don't return error on signal exit
2024-05-16 16:25:38 -07:00
Michael Yang
84ed77cbd8
Merge pull request #4436 from ollama/mxyng/done-part
...
return on part done
2024-05-15 17:16:24 -07:00
Patrick Devine
d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models ( #4461 )
2024-05-15 15:43:16 -07:00
Patrick Devine
f2cf97d6f1
fix typo in modelfile generation ( #4439 )
2024-05-14 15:34:29 -07:00
Michael Yang
85a57006d1
check if name exists before create/pull/copy
2024-05-14 14:58:58 -07:00
Michael Yang
c5e892cb3e
update tests
2024-05-14 14:56:31 -07:00
Michael Yang
81fb06f530
more resilient Manifests
2024-05-14 14:08:24 -07:00
Michael Yang
a385382ff5
filepath.Join
2024-05-14 14:08:24 -07:00
Michael Yang
b8772a353f
remove DeleteModel
2024-05-14 14:08:24 -07:00
Michael Yang
c2714fcbfd
routes: use Manifests for ListHandler
2024-05-14 14:08:24 -07:00
Michael Yang
a2fc933fed
update delete handler to use model.Name
2024-05-14 14:08:24 -07:00
Michael Yang
ac145f75ca
return on part done
2024-05-14 13:04:30 -07:00
Ryo Machida
798b107f19
Fixed the API endpoint /api/tags when the model list is empty. ( #4424 )
...
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.
* Update server/routes.go
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Daniel Hiltgen
ec231a7923
Remove VRAM convergence check for windows
...
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
Patrick Devine
7ca71a6b0f
don't abort when an invalid model name is used in /save ( #4416 )
2024-05-13 18:48:28 -07:00
Patrick Devine
6845988807
Ollama ps
command for showing currently loaded models ( #4327 )
2024-05-13 17:17:36 -07:00
jmorganca
4ec7445a6f
Revert "use post token"
...
This reverts commit 0fec3525ad
.
2024-05-11 22:19:14 -07:00
Michael Yang
0fec3525ad
use post token
2024-05-11 19:13:16 -07:00
Daniel Hiltgen
824ee5446f
Fix envconfig unit test
2024-05-10 16:49:48 -07:00
Daniel Hiltgen
4142c3ef7c
Always use the sorted list of GPUs
...
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00
Jeffrey Morgan
6602e793c0
Use --quantize
flag and quantize
api parameter ( #4321 )
...
* rename `--quantization` to `--quantize`
* backwards
* Update api/types.go
Co-authored-by: Michael Yang <mxyng@pm.me>
---------
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-05-10 13:06:13 -07:00
Jeffrey Morgan
bb6fd02298
Don't clamp ctx size in PredictServerFit
( #4317 )
...
* dont clamp ctx size in `PredictServerFit`
* minimum 4 context
* remove context warning
2024-05-10 10:17:12 -07:00
Michael Yang
e03637176d
fix(routes): skip bad manifests
2024-05-10 08:46:11 -07:00
Jeffrey Morgan
302d7fdbf3
prune partial downloads ( #4272 )
2024-05-09 16:35:20 -07:00
Daniel Hiltgen
3ae2f441e0
Fix race in shutdown logic
...
Ensure the runners are terminated
2024-05-09 15:54:02 -07:00
Daniel Hiltgen
354ad9254e
Wait for GPU free memory reporting to converge
...
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Daniel Hiltgen
8727a9c140
Record more GPU information
...
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Bruce MacDonald
cfa84b8470
add done_reason to the api ( #4235 )
2024-05-09 13:30:14 -07:00
Michael Yang
a7ee84fc31
routes: skip invalid filepaths
2024-05-09 11:23:22 -07:00
Jeffrey Morgan
d5eec16d23
use model defaults for num_gqa
, rope_frequency_base
and rope_frequency_scale
( #1983 )
2024-05-09 09:06:13 -07:00
Bruce MacDonald
cef45feaa4
Add preflight OPTIONS handling and update CORS config ( #4086 )
...
* Add preflight OPTIONS handling and update CORS config
- Implement early return with HTTP 204 (No Content) for OPTIONS requests in allowedHostsMiddleware to optimize preflight handling.
- Extend CORS configuration to explicitly allow 'Authorization' headers and 'OPTIONS' method when OLLAMA_ORIGINS environment variable is set.
* allow auth, content-type, and user-agent headers
* Update routes.go
2024-05-08 13:14:00 -07:00
Michael Yang
b25976aeb8
routes: fix show llava models
2024-05-08 12:43:36 -07:00
Michael Yang
88cf154483
Merge pull request #4244 from ollama/mxyng/skip-if-same
...
skip if same quantization
2024-05-07 19:03:37 -07:00
Bruce MacDonald
8cbd3e7510
skip hidden files in list models handler ( #4247 )
2024-05-07 19:01:45 -07:00
Michael Yang
eeb695261f
skip if same quantization
2024-05-07 17:44:19 -07:00
Bruce MacDonald
dc9b1111e0
fix invalid destination error message
2024-05-07 17:35:52 -07:00
Michael Yang
ffbd3d173f
Merge pull request #3715 from ollama/mxyng/modelname-2
...
update list handler to use model.Name
2024-05-07 15:21:39 -07:00
Michael Yang
1e0a669f75
Merge pull request #3682 from ollama/mxyng/quantize-all-the-things
...
quantize any fp16/fp32 model
2024-05-07 15:20:49 -07:00
Michael Yang
548a7df014
update list handler to use model.Name
2024-05-07 09:38:45 -07:00