Daniel Hiltgen
5e8ff556cb
Support forced spreading for multi GPU
...
Our default behavior today is to try to fit into a single GPU if possible.
Some users would prefer the old behavior of always spreading across
multiple GPUs even if the model can fit into one. This exposes that
tunable behavior.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Patrick Devine
94618b2365
add OLLAMA_MODELS to envconfig ( #5029 )
2024-06-13 12:52:03 -07:00
Jeffrey Morgan
1fd236d177
server: remove jwt decoding error ( #5027 )
2024-06-13 11:21:15 -07:00
Michael Yang
c16f8af911
fix: multiple templates when creating from model
...
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
515f497e6d
fix: skip removing layers that no longer exist
2024-06-10 11:32:19 -07:00
Michael Yang
b27268aaef
add test
2024-06-10 11:32:15 -07:00
Michael Yang
030e765e76
fix create model when template detection errors
2024-06-07 10:51:35 -07:00
Michael Yang
9b6c2e6eb6
detect chat template from KV
2024-06-06 16:03:47 -07:00
royjhan
1a29e9a879
API app/browser access ( #4879 )
...
* API app/browser access
* Add tauri (resolves #2291 , #4791 , #3799 , #4388 )
2024-06-06 15:19:03 -07:00
royjhan
4bf1da4944
Separate ListResponse and ModelResponse for api/tags vs api/ps ( #4842 )
...
* Remove false time fields
* Struct Separation for List and Process
* Remove Marshaler
2024-06-06 10:11:45 -07:00
Blake Mizerany
de5beb06b3
server: skip blob verification for already verified blobs
2024-06-05 16:39:11 -07:00
Michael Yang
d61ef8b954
update create handler to use model.Name
2024-06-04 13:28:25 -07:00
Michael Yang
6297f85606
gofmt, goimports
2024-06-04 13:20:24 -07:00
Michael Yang
8ce4032e72
more lint
2024-06-04 11:13:30 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f
some gocritic
2024-06-04 11:13:30 -07:00
Michael Yang
8ffb51749f
nolintlint
2024-06-04 11:13:30 -07:00
Michael Yang
04f3c12bb7
replace x/exp/slices with slices
2024-06-04 11:13:30 -07:00
Michael Yang
96bc232b43
Merge pull request #4413 from ollama/mxyng/name-check
...
check if name exists before create/pull/copy
2024-05-29 12:06:58 -07:00
Michael Yang
bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
...
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Michael Yang
6adca97f37
Merge pull request #4619 from noxer/patch-1
...
Fix download retry issue
2024-05-24 17:21:57 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Tim Scheuermann
db2ffa79f1
Fix download retry issue
2024-05-24 20:30:42 +02:00
Jeffrey Morgan
38255d2af1
Use flash attention flag for now ( #4580 )
...
* put flash attention behind flag for now
* add test
* remove print
* up timeout for sheduler tests
2024-05-22 21:52:09 -07:00
Sang Park
4434d7f447
Correct typo in error message ( #4535 )
...
The spelling of the term "request" has been corrected, which was previously mistakenly written as "requeset" in the error log message.
2024-05-21 13:39:01 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
Daniel Hiltgen
02b31c9dc8
Don't return error on signal exit
2024-05-16 16:25:38 -07:00
Michael Yang
84ed77cbd8
Merge pull request #4436 from ollama/mxyng/done-part
...
return on part done
2024-05-15 17:16:24 -07:00
Patrick Devine
d1692fd3e0
fix the cpu estimatedTotal memory + get the expiry time for loading models ( #4461 )
2024-05-15 15:43:16 -07:00
Patrick Devine
f2cf97d6f1
fix typo in modelfile generation ( #4439 )
2024-05-14 15:34:29 -07:00
Michael Yang
85a57006d1
check if name exists before create/pull/copy
2024-05-14 14:58:58 -07:00
Michael Yang
c5e892cb3e
update tests
2024-05-14 14:56:31 -07:00
Michael Yang
81fb06f530
more resilient Manifests
2024-05-14 14:08:24 -07:00
Michael Yang
a385382ff5
filepath.Join
2024-05-14 14:08:24 -07:00
Michael Yang
b8772a353f
remove DeleteModel
2024-05-14 14:08:24 -07:00
Michael Yang
c2714fcbfd
routes: use Manifests for ListHandler
2024-05-14 14:08:24 -07:00
Michael Yang
a2fc933fed
update delete handler to use model.Name
2024-05-14 14:08:24 -07:00
Michael Yang
ac145f75ca
return on part done
2024-05-14 13:04:30 -07:00
Ryo Machida
798b107f19
Fixed the API endpoint /api/tags when the model list is empty. ( #4424 )
...
* Fixed the API endpoint /api/tags to return {models: []} instead of {models: null} when the model list is empty.
* Update server/routes.go
---------
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-05-14 11:18:10 -07:00
Daniel Hiltgen
ec231a7923
Remove VRAM convergence check for windows
...
The APIs we query are optimistic on free space, and windows pages
VRAM, so we don't have to wait to see reported usage recover on unload
2024-05-14 09:53:46 -07:00
Patrick Devine
7ca71a6b0f
don't abort when an invalid model name is used in /save ( #4416 )
2024-05-13 18:48:28 -07:00
Patrick Devine
6845988807
Ollama ps
command for showing currently loaded models ( #4327 )
2024-05-13 17:17:36 -07:00
jmorganca
4ec7445a6f
Revert "use post token"
...
This reverts commit 0fec3525ad
.
2024-05-11 22:19:14 -07:00
Michael Yang
0fec3525ad
use post token
2024-05-11 19:13:16 -07:00
Daniel Hiltgen
824ee5446f
Fix envconfig unit test
2024-05-10 16:49:48 -07:00
Daniel Hiltgen
4142c3ef7c
Always use the sorted list of GPUs
...
Make sure the first GPU has the most free space
2024-05-10 13:53:21 -07:00