Michael Yang
e4d0a9c325
fix(test): do not clobber models directory
2024-08-28 14:07:48 -07:00
Patrick Devine
7416ced70f
add llama3.1 chat template ( #6545 )
2024-08-28 14:03:20 -07:00
Michael Yang
9cfd2dd3e3
Merge pull request #6522 from ollama/mxyng/detect-chat
...
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Michael Yang
8e6da3cbc5
update deprecated warnings
2024-08-28 09:55:11 -07:00
Michael Yang
d9d50c43cc
validate model path
2024-08-28 09:32:57 -07:00
Patrick Devine
6c1c1ad6a9
throw an error when encountering unsupport tensor sizes ( #6538 )
2024-08-27 17:54:04 -07:00
Daniel Hiltgen
93ea9240ae
Move ollama executable out of bin dir ( #6535 )
2024-08-27 16:19:00 -07:00
Michael Yang
413ae39f3c
update templates to use messages
2024-08-27 15:44:04 -07:00
Michael Yang
60e47573a6
more tokenizer tests
2024-08-27 14:51:10 -07:00
Patrick Devine
d13c3daa0b
add safetensors to the modelfile docs ( #6532 )
2024-08-27 14:46:47 -07:00
Patrick Devine
1713eddcd0
Fix import image width ( #6528 )
2024-08-27 14:19:47 -07:00
Daniel Hiltgen
4e1c4f6e0b
Update manual instructions with discrete ROCm bundle ( #6445 )
2024-08-27 13:42:28 -07:00
Sean Khatiri
397cae7962
llm: fix typo in comment ( #6530 )
2024-08-27 13:28:29 -07:00
Patrick Devine
1c70a00f71
adjust image sizes
2024-08-27 11:15:25 -07:00
Michael Yang
eae3af6807
clean up convert tokenizer
2024-08-27 11:11:43 -07:00
Michael Yang
3eb08377f8
detect chat template from configs that contain lists
2024-08-27 10:49:33 -07:00
Patrick Devine
ac80010db8
update the import docs ( #6104 )
2024-08-26 19:57:26 -07:00
Jeffrey Morgan
47fa0839b9
server: clean up route names for consistency ( #6524 )
2024-08-26 19:36:11 -07:00
Daniel Hiltgen
0f92b19bec
Only enable numa on CPUs ( #6484 )
...
The numa flag may be having a performance impact on multi-socket systems with GPU loads
2024-08-24 17:24:50 -07:00
Daniel Hiltgen
69be940bf6
gpu: Group GPU Library sets by variant ( #6483 )
...
The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.
2024-08-23 15:11:56 -07:00
Michael Yang
9638c24c58
Merge pull request #5446 from ollama/mxyng/faq
...
update faq
2024-08-23 14:05:59 -07:00
Michael Yang
bb362caf88
update faq
2024-08-23 13:37:21 -07:00
Michael Yang
386af6c1a0
passthrough OLLAMA_HOST path to client
2024-08-23 13:23:28 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF ( #6327 )
2024-08-23 11:29:56 -07:00
Daniel Hiltgen
7a1e1c1caf
gpu: Ensure driver version set before variant ( #6480 )
...
During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.
2024-08-23 11:21:12 -07:00
Daniel Hiltgen
0b03b9c32f
llm: Align cmake define for cuda no peer copy ( #6455 )
...
Define changed recently and this slipped through the cracks with the old
name.
2024-08-23 11:20:39 -07:00
Daniel Hiltgen
90ca84172c
Fix embeddings memory corruption ( #6467 )
...
* Fix embeddings memory corruption
The patch was leading to a buffer overrun corruption. Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count. To
work around this, only use slot 0 for embeddings.
* Fix embed integration test assumption
The token eval count has changed with recent llama.cpp bumps (0.3.5+)
2024-08-22 14:51:42 -07:00
Michael Yang
6bd8a4b0a1
Merge pull request #6064 from ollama/mxyng/convert-llama3
...
convert: update llama conversion for llama3.1
2024-08-21 12:57:09 -07:00
Michael Yang
77903ab8b4
llama3.1
2024-08-21 11:49:31 -07:00
Michael Yang
e22286c9e1
Merge pull request #5365 from ollama/mxyng/convert-gemma2
...
convert gemma2
2024-08-21 11:48:43 -07:00
Michael Yang
107f695929
Merge pull request #4917 from ollama/mxyng/convert-bert
...
convert bert model from safetensors
2024-08-21 11:48:29 -07:00
Michael Yang
4ecc70d3b4
Merge pull request #6386 from zwwhdls/fix-new-layer
...
fix: chmod new layer to 0o644 when creating it
2024-08-21 10:58:45 -07:00
Michael Yang
3546bbd08c
convert gemma2
2024-08-20 17:27:51 -07:00
Michael Yang
beb49eef65
create bert models from cli
2024-08-20 17:27:34 -07:00
Michael Yang
5a28b9cf5f
bert
2024-08-20 17:27:34 -07:00
Daniel Hiltgen
a017cf2fea
Split rocm back out of bundle ( #6432 )
...
We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions. This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.
2024-08-20 07:26:38 -07:00
Daniel Hiltgen
19e5a890f7
CI: remove directories from dist dir before upload step ( #6429 )
2024-08-19 15:19:21 -07:00
Daniel Hiltgen
f91c9e3709
CI: handle directories during checksum ( #6427 )
2024-08-19 13:48:45 -07:00
Daniel Hiltgen
2df6905ede
Merge pull request #6424 from dhiltgen/cuda_v12
...
Fix overlapping artifact name on CI
2024-08-19 12:11:58 -07:00
Daniel Hiltgen
d8be22e47d
Fix overlapping artifact name on CI
2024-08-19 12:07:18 -07:00
Daniel Hiltgen
652c273f0e
Merge pull request #5049 from dhiltgen/cuda_v12
...
Cuda v12
2024-08-19 11:14:24 -07:00
Daniel Hiltgen
88e7705079
Merge pull request #6402 from rick-github/numParallel
...
Override numParallel in pickBestPartialFitByLibrary() only if unset.
2024-08-19 11:07:22 -07:00
Daniel Hiltgen
f9e31da946
Review comments
2024-08-19 10:36:15 -07:00
Daniel Hiltgen
88bb9e3328
Adjust layout to bin+lib/ollama
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
3b19cdba2a
Remove Jetpack
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
927d98a6cd
Add windows cuda v12 + v11 support
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
f6c811b320
Enable cuda v12 flags
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
4fe3a556fa
Add cuda v12 variant and selection logic
...
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
fc3b4cda89
Report GPU variant in log
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
d470ebe78b
Add Jetson cuda variants for arm
...
This adds new variants for arm64 specific to Jetson platforms
2024-08-19 09:38:53 -07:00