Commit graph

3536 commits

Author SHA1 Message Date
Daniel Hiltgen
0f92b19bec
Only enable numa on CPUs (#6484)
The numa flag may be having a performance impact on multi-socket systems with GPU loads
2024-08-24 17:24:50 -07:00
Daniel Hiltgen
69be940bf6
gpu: Group GPU Library sets by variant (#6483)
The recent cuda variant changes uncovered a bug in ByLibrary
which failed to group by common variant for GPU types.
2024-08-23 15:11:56 -07:00
Michael Yang
9638c24c58
Merge pull request #5446 from ollama/mxyng/faq
update faq
2024-08-23 14:05:59 -07:00
Michael Yang
bb362caf88 update faq 2024-08-23 13:37:21 -07:00
Michael Yang
386af6c1a0 passthrough OLLAMA_HOST path to client 2024-08-23 13:23:28 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
Daniel Hiltgen
7a1e1c1caf
gpu: Ensure driver version set before variant (#6480)
During rebasing, the ordering was inverted causing the cuda version
selection logic to break, with driver version being evaluated as zero
incorrectly causing a downgrade to v11.
2024-08-23 11:21:12 -07:00
Daniel Hiltgen
0b03b9c32f
llm: Align cmake define for cuda no peer copy (#6455)
Define changed recently and this slipped through the cracks with the old
name.
2024-08-23 11:20:39 -07:00
Daniel Hiltgen
90ca84172c
Fix embeddings memory corruption (#6467)
* Fix embeddings memory corruption

The patch was leading to a buffer overrun corruption.  Once removed though, parallism
in server.cpp lead to hitting an assert due to slot/seq IDs being >= token count.  To
work around this, only use slot 0 for embeddings.

* Fix embed integration test assumption

The token eval count has changed with recent llama.cpp bumps (0.3.5+)
2024-08-22 14:51:42 -07:00
Michael Yang
6bd8a4b0a1
Merge pull request #6064 from ollama/mxyng/convert-llama3
convert: update llama conversion for llama3.1
2024-08-21 12:57:09 -07:00
Michael Yang
77903ab8b4 llama3.1 2024-08-21 11:49:31 -07:00
Michael Yang
e22286c9e1
Merge pull request #5365 from ollama/mxyng/convert-gemma2
convert gemma2
2024-08-21 11:48:43 -07:00
Michael Yang
107f695929
Merge pull request #4917 from ollama/mxyng/convert-bert
convert bert model from safetensors
2024-08-21 11:48:29 -07:00
Michael Yang
4ecc70d3b4
Merge pull request #6386 from zwwhdls/fix-new-layer
fix: chmod new layer to 0o644 when creating it
2024-08-21 10:58:45 -07:00
Michael Yang
3546bbd08c convert gemma2 2024-08-20 17:27:51 -07:00
Michael Yang
beb49eef65 create bert models from cli 2024-08-20 17:27:34 -07:00
Michael Yang
5a28b9cf5f bert 2024-08-20 17:27:34 -07:00
Daniel Hiltgen
a017cf2fea
Split rocm back out of bundle (#6432)
We're over budget for github's maximum release artifact size with rocm + 2 cuda
versions.  This splits rocm back out as a discrete artifact, but keeps the layout so it can
be extracted into the same location as the main bundle.
2024-08-20 07:26:38 -07:00
Daniel Hiltgen
19e5a890f7
CI: remove directories from dist dir before upload step (#6429) 2024-08-19 15:19:21 -07:00
Daniel Hiltgen
f91c9e3709
CI: handle directories during checksum (#6427) 2024-08-19 13:48:45 -07:00
Daniel Hiltgen
2df6905ede
Merge pull request #6424 from dhiltgen/cuda_v12
Fix overlapping artifact name on CI
2024-08-19 12:11:58 -07:00
Daniel Hiltgen
d8be22e47d Fix overlapping artifact name on CI 2024-08-19 12:07:18 -07:00
Daniel Hiltgen
652c273f0e
Merge pull request #5049 from dhiltgen/cuda_v12
Cuda v12
2024-08-19 11:14:24 -07:00
Daniel Hiltgen
88e7705079
Merge pull request #6402 from rick-github/numParallel
Override numParallel in pickBestPartialFitByLibrary() only if unset.
2024-08-19 11:07:22 -07:00
Daniel Hiltgen
f9e31da946 Review comments 2024-08-19 10:36:15 -07:00
Daniel Hiltgen
88bb9e3328 Adjust layout to bin+lib/ollama 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
3b19cdba2a Remove Jetpack 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
927d98a6cd Add windows cuda v12 + v11 support 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
f6c811b320 Enable cuda v12 flags 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
4fe3a556fa Add cuda v12 variant and selection logic
Based on compute capability and driver version, pick
v12 or v11 cuda variants.
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
fc3b4cda89 Report GPU variant in log 2024-08-19 09:38:53 -07:00
Daniel Hiltgen
d470ebe78b Add Jetson cuda variants for arm
This adds new variants for arm64 specific to Jetson platforms
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
c7bcb00319 Wire up ccache and pigz in the docker based build
This should help speed things up a little
2024-08-19 09:38:53 -07:00
Daniel Hiltgen
74d45f0102 Refactor linux packaging
This adjusts linux to follow a similar model to windows with a discrete archive
(zip/tgz) to cary the primary executable, and dependent libraries. Runners are
still carried as payloads inside the main binary

Darwin retain the payload model where the go binary is fully self contained.
2024-08-19 09:38:53 -07:00
Jeffrey Morgan
9fddef3731
server: limit upload parts to 16 (#6411) 2024-08-19 09:20:52 -07:00
Richard Lyons
885cf45087 Fix white space. 2024-08-18 03:07:16 +02:00
Richard Lyons
9352eeb752 Reset NumCtx. 2024-08-18 02:55:01 +02:00
Richard Lyons
0ad0e738cd Override numParallel only if unset. 2024-08-18 01:43:26 +02:00
zwwhdls
bdc4308afb fix: chmod new layer to 0o644 when creating it
Signed-off-by: zwwhdls <zww@hdls.me>
2024-08-16 11:43:19 +08:00
Daniel Hiltgen
d29cd4c2ed
Merge pull request #6381 from eust-w/main
fix: Add tooltip to system tray icon
2024-08-15 15:31:15 -07:00
eust-w
a84c05cf91 fix: Add tooltip to system tray icon
- Updated setIcon method to include tooltip text for the system tray icon.
- Added NIF_TIP flag and set the tooltip text using UTF16 encoding.

Resolves: #6372
2024-08-16 06:00:12 +08:00
Michael Yang
e3d7f32af7
Merge pull request #6363 from ollama/mxyng/fix-noprune
fix: noprune on pull
2024-08-15 12:20:38 -07:00
Michael Yang
3a75e74e34 only skip invalid json manifests 2024-08-15 10:29:14 -07:00
Michael Yang
237dccba1e skip invalid manifest files 2024-08-14 16:55:45 -07:00
Michael Yang
b3f75fc812 fix noprune 2024-08-14 15:48:51 -07:00
Jeffrey Morgan
8200c371ae
add CONTRIBUTING.md (#6349) 2024-08-14 15:19:50 -07:00
longtao
0a8d6ea86d
Fix typo and improve readability (#5964)
* Fix typo and improve readability

Summary:
* Rename updatAvailableMenuID to updateAvailableMenuID
* Replace unused cmd parameter with _ in RunServer function
* Fix typos in comments

(cherry picked from commit 5b8715f0b04773369e8eb1f9e6737995a0ab3ba7)

* Update api/client.go

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>

---------

Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-08-13 17:54:19 -07:00
Blake Mizerany
8e1050f366
server: reduce max connections used in download (#6347)
The previous value of 64 was WAY too high and unnecessary. It reached
diminishing returns and blew past it. This is a more reasonable number
for _most_ normal cases. For users on cloud servers with excellent
network quality, this will keep screaming for them, without hitting our
CDN limits. For users with relatively poor network quality, this will
keep them from saturating their network and causing other issues.
2024-08-13 16:47:35 -07:00
Bruce MacDonald
eda8a32a09
update chatml template format to latest in docs (#6344) 2024-08-13 16:39:18 -07:00
Michael Yang
a0a40aa20c
Merge pull request #6346 from ollama/mxyng/lint 2024-08-13 14:58:35 -07:00