Michael Yang
da8e2a0447
use kvs to detect embedding models
2024-07-01 10:47:43 -07:00
Michael Yang
a30915bde1
add capabilities
2024-07-01 10:47:43 -07:00
Michael Yang
58e3fff311
rename templates to template
2024-07-01 10:40:54 -07:00
Michael Yang
3f0b309ad4
remove ManifestV2
2024-07-01 10:40:54 -07:00
Blake Mizerany
cb42e607c5
llm: speed up gguf decoding by a lot ( #5246 )
...
Previously, some costly things were causing the loading of GGUF files
and their metadata and tensor information to be VERY slow:
* Too many allocations when decoding strings
* Hitting disk for each read of each key and value, resulting in a
not-okay amount of syscalls/disk I/O.
The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro
m3.
This commit also prevents collecting large arrays of values when
decoding GGUFs (if desired). When such keys are encountered, their
values are null, and are encoded as such in JSON.
Also, this fixes a broken test that was not encoding valid GGUF.
2024-06-24 21:47:52 -07:00
Michael Yang
e835ef1836
fix: quantization with template
2024-06-21 13:39:25 -07:00
Jeffrey Morgan
1fd236d177
server: remove jwt decoding error ( #5027 )
2024-06-13 11:21:15 -07:00
Michael Yang
c16f8af911
fix: multiple templates when creating from model
...
multiple templates may appear in a model if a model is created from
another model that 1) has an autodetected template and 2) defines a
custom template
2024-06-12 13:35:49 -07:00
Michael Yang
030e765e76
fix create model when template detection errors
2024-06-07 10:51:35 -07:00
Michael Yang
9b6c2e6eb6
detect chat template from KV
2024-06-06 16:03:47 -07:00
Blake Mizerany
de5beb06b3
server: skip blob verification for already verified blobs
2024-06-05 16:39:11 -07:00
Michael Yang
d61ef8b954
update create handler to use model.Name
2024-06-04 13:28:25 -07:00
Michael Yang
6297f85606
gofmt, goimports
2024-06-04 13:20:24 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Michael Yang
8ffb51749f
nolintlint
2024-06-04 11:13:30 -07:00
Michael Yang
04f3c12bb7
replace x/exp/slices with slices
2024-06-04 11:13:30 -07:00
Michael Yang
bca7b12284
Merge pull request #3718 from ollama/mxyng/modelname-3
...
update delete handler to use model.Name
2024-05-29 12:02:07 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Michael Yang
807d092761
fix quantize file types
2024-05-20 15:22:11 -07:00
Michael Yang
f36f1d6be9
tidy intermediate blobs
2024-05-20 15:15:06 -07:00
Michael Yang
3520c0e4d5
cache and reuse intermediate blobs
...
particularly useful for zipfiles and f16s
2024-05-20 13:25:10 -07:00
Patrick Devine
ccdf0b2a44
Move the parser back + handle utf16 files ( #4533 )
2024-05-20 11:26:45 -07:00
Michael Yang
b8772a353f
remove DeleteModel
2024-05-14 14:08:24 -07:00
Jeffrey Morgan
302d7fdbf3
prune partial downloads ( #4272 )
2024-05-09 16:35:20 -07:00
Michael Yang
b25976aeb8
routes: fix show llava models
2024-05-08 12:43:36 -07:00
Michael Yang
eeb695261f
skip if same quantization
2024-05-07 17:44:19 -07:00
Michael Yang
548a7df014
update list handler to use model.Name
2024-05-07 09:38:45 -07:00
Michael Yang
d245460362
only quantize language models
2024-05-06 15:24:01 -07:00
Michael Yang
4d0d0fa383
no iterator
2024-05-06 15:24:01 -07:00
Michael Yang
7ffe45734d
rebase
2024-05-06 15:24:01 -07:00
Michael Yang
01811c176a
comments
2024-05-06 15:24:01 -07:00
Michael Yang
9685c34509
quantize any fp16/fp32 model
...
- FROM /path/to/{safetensors,pytorch}
- FROM /path/to/fp{16,32}.bin
- FROM model:fp{16,32}
2024-05-06 15:24:01 -07:00
Daniel Hiltgen
f56aa20014
Centralize server config handling
...
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
Michael Yang
119589fcb3
rename parser to model/file
2024-05-01 09:53:50 -07:00
Michael Yang
9cf0f2e973
use parser.Format instead of templating modelfile
2024-05-01 09:52:54 -07:00
Bruce MacDonald
0a7fdbe533
prompt to display and add local ollama keys to account ( #3717 )
...
- return descriptive error messages when unauthorized to create blob or push a model
- display the local public key associated with the request that was denied
2024-04-30 11:02:08 -07:00
Jeffrey Morgan
586672f490
fix copying model to itself ( #4019 )
2024-04-28 23:47:49 -04:00
Blake Mizerany
37f9c8ad99
types/model: overhaul Name and Digest types ( #3924 )
2024-04-26 13:08:32 -07:00
Michael Yang
592dae31c8
update copy to use model.Name
2024-04-24 15:54:54 -07:00
Cheng
62be2050dd
chore: use errors.New to replace fmt.Errorf will much better ( #3789 )
2024-04-20 22:11:06 -04:00
Patrick Devine
9f8691c6c8
Add llama2 / torch models for ollama create
( #3607 )
2024-04-15 11:26:42 -07:00
Michael Yang
9502e5661f
cgo quantize
2024-04-08 15:31:08 -07:00
Patrick Devine
3b6a9154dd
Simplify model conversion ( #3422 )
2024-04-01 16:14:53 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama
to github.com/ollama/ollama
( #3347 )
2024-03-26 13:04:17 -07:00
Blake Mizerany
703684a82a
server: replace blob prefix separator from ':' to '-' ( #3146 )
...
This fixes issues with blob file names that contain ':' characters to be rejected by file systems that do not support them.
2024-03-14 20:18:06 -07:00
Michael Yang
76bdebbadf
decode ggla
2024-03-08 15:46:25 -08:00
Bruce MacDonald
0cebc79cba
fix: allow importing a model from name reference ( #3005 )
2024-03-08 12:27:47 -05:00
Patrick Devine
2c017ca441
Convert Safetensors to an Ollama model ( #2824 )
2024-03-06 21:01:51 -08:00