Michael Yang
4f895d633f
Merge pull request #3466 from ollama/mxyng/head-kv
...
default head_kv to 1
2024-04-03 10:41:00 -07:00
Daniel Hiltgen
464d817824
Merge pull request #3464 from dhiltgen/subprocess
...
Fix numgpu opt miscomparison
2024-04-02 20:10:17 -07:00
Daniel Hiltgen
6589eb8a8c
Revert options as a ref in the server
2024-04-02 16:44:10 -07:00
Michael Yang
90f071c658
default head_kv to 1
2024-04-02 16:37:59 -07:00
Michael Yang
80163ebcb5
fix metal gpu
2024-04-02 16:06:45 -07:00
Daniel Hiltgen
0035e31af8
Bump to b2581
2024-04-02 11:53:07 -07:00
Daniel Hiltgen
0a0e9f3e0f
Apply 01-cache.diff
2024-04-01 16:48:18 -07:00
Daniel Hiltgen
58d95cc9bd
Switch back to subprocessing for llama.cpp
...
This should resolve a number of memory leak and stability defects by allowing
us to isolate llama.cpp in a separate process and shutdown when idle, and
gracefully restart if it has problems. This also serves as a first step to be
able to run multiple copies to support multiple models concurrently.
2024-04-01 16:48:18 -07:00
Michael Yang
91b3e4d282
update memory calcualtions
...
count each layer independently when deciding gpu offloading
2024-04-01 13:16:32 -07:00
Michael Yang
d338d70492
refactor model parsing
2024-04-01 13:16:15 -07:00
Patrick Devine
5a5efee46b
Add gemma safetensors conversion ( #3250 )
...
Co-authored-by: Michael Yang <mxyng@pm.me>
2024-03-28 18:54:01 -07:00
Jeffrey Morgan
f5ca7f8c8e
add license in file header for vendored llama.cpp code ( #3351 )
2024-03-26 16:23:23 -04:00
Jeffrey Morgan
856b8ec131
remove need for $VSINSTALLDIR
since build will fail if ninja
cannot be found ( #3350 )
2024-03-26 16:23:16 -04:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama
to github.com/ollama/ollama
( #3347 )
2024-03-26 13:04:17 -07:00
Daniel Hiltgen
8091ef2eeb
Bump llama.cpp to b2527
2024-03-25 13:47:44 -07:00
Daniel Hiltgen
560be5e0b6
Merge pull request #3308 from dhiltgen/bump_more
...
Bump llama.cpp to b2510
2024-03-25 12:56:12 -07:00
Jeremy
dfc6721b20
add support for libcudart.so for CUDA devices (adds Jetson support)
2024-03-25 11:07:44 -04:00
Blake Mizerany
acfa2b9422
llm: prevent race appending to slice ( #3320 )
2024-03-24 11:35:54 -07:00
Daniel Hiltgen
3e30c75f3e
Bump llama.cpp to b2510
2024-03-23 19:55:56 +01:00
Daniel Hiltgen
43799532c1
Bump llama.cpp to b2474
...
The release just before ggml-cuda.cu refactoring
2024-03-23 09:54:56 +01:00
Daniel Hiltgen
74788b487c
Better tmpdir cleanup
...
If expanding the runners fails, don't leave a corrupt/incomplete payloads dir
We now write a pid file out to the tmpdir, which allows us to scan for stale tmpdirs
and remove this as long as there isn't still a process running.
2024-03-20 16:03:19 +01:00
Michael Yang
3c4ad0ecab
dyn global
2024-03-18 09:45:45 +01:00
Michael Yang
22f326464e
Merge pull request #3083 from ollama/mxyng/refactor-readseeker
...
refactor readseeker
2024-03-16 12:08:56 -07:00
Jeffrey Morgan
e95ffc7448
llama: remove server static assets ( #3174 )
2024-03-15 19:24:12 -07:00
Daniel Hiltgen
ab3456207b
Merge pull request #3028 from ollama/ci_release
...
CI release process
2024-03-15 16:40:54 -07:00
Daniel Hiltgen
6ad414f31e
Merge pull request #3086 from dhiltgen/import_server
...
Import server.cpp to retain llava support
2024-03-15 16:10:35 -07:00
Daniel Hiltgen
d4c10df2b0
Add Radeon gfx940-942 GPU support
2024-03-15 15:34:58 -07:00
Daniel Hiltgen
540f4af45f
Wire up more complete CI for releases
...
Flesh out our github actions CI so we can build official releaes.
2024-03-15 12:37:36 -07:00
Blake Mizerany
6ce37e4d96
llm,readline: use errors.Is instead of simple == check ( #3161 )
...
This fixes some brittle, simple equality checks to use errors.Is. Since
go1.13, errors.Is is the idiomatic way to check for errors.
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-03-15 07:14:12 -07:00
Michael Yang
291c663865
fix: clip memory leak
2024-03-14 13:12:42 -07:00
Jeffrey Morgan
e72c567cfd
restore locale patch ( #3091 )
2024-03-12 22:08:13 -07:00
Bruce MacDonald
3e22611200
token repeat limit for prediction requests ( #3080 )
2024-03-12 22:08:25 -04:00
Bruce MacDonald
2f804068bd
warn when json format is expected but not mentioned in prompt ( #3081 )
2024-03-12 19:07:11 -04:00
Daniel Hiltgen
85129d3a32
Adapt our build for imported server.cpp
2024-03-12 14:57:15 -07:00
Daniel Hiltgen
9ac6440da3
Import server.cpp as of b2356
2024-03-12 13:58:06 -07:00
Michael Yang
0085297928
refactor readseeker
2024-03-12 12:54:18 -07:00
racerole
53c107e20e
chore: fix typo ( #3073 )
...
Signed-off-by: racerole <jiangyifeng@outlook.com>
2024-03-12 14:09:22 -04:00
Bruce MacDonald
b80661e8c7
relay load model errors to the client ( #3065 )
2024-03-11 16:48:27 -04:00
Jeffrey Morgan
369eda65f5
update llama.cpp submodule to ceca1ae
( #3064 )
2024-03-11 12:57:48 -07:00
Daniel Hiltgen
bc13da2bfe
Avoid rocm runner and dependency clash
...
Putting the rocm symlink next to the runners is risky. This moves
the payloads into a subdir to avoid potential clashes.
2024-03-11 09:33:22 -07:00
Jeffrey Morgan
41b00b9856
fix 03-locale.diff
2024-03-10 16:21:05 -07:00
Daniel Hiltgen
3dc1bb6a35
Harden for deps file being empty (or short)
2024-03-10 14:45:38 -07:00
Jeffrey Morgan
908005d90b
patch: use default locale in wpm tokenizer ( #3034 )
2024-03-09 21:12:12 -08:00
Jeffrey Morgan
e11668aa07
add bundle_metal
and cleanup_metal
funtions to gen_darwin.sh
2024-03-09 16:04:57 -08:00
Jeffrey Morgan
1ffb1e2874
update llama.cpp submodule to 77d1ac7
( #3030 )
2024-03-09 15:55:34 -08:00
Jeffrey Morgan
f9cd55c70b
disable gpu for certain model architectures and fix divide-by-zero on memory estimation
2024-03-09 12:51:38 -08:00
Daniel Hiltgen
4a5c9b8035
Finish unwinding idempotent payload logic
...
The recent ROCm change partially removed idempotent
payloads, but the ggml-metal.metal file for mac was still
idempotent. This finishes switching to always extract
the payloads, and now that idempotentcy is gone, the
version directory is no longer useful.
2024-03-09 08:34:39 -08:00
Jeffrey Morgan
efe5617b64
update llama.cpp submodule to c2101a2
( #3020 )
2024-03-09 00:44:50 -08:00
Michael Yang
76bdebbadf
decode ggla
2024-03-08 15:46:25 -08:00
Jeffrey Morgan
0e4669b04f
update llama.cpp submodule to 6cdabe6
( #2999 )
2024-03-08 00:26:20 -08:00
Daniel Hiltgen
6c5ccb11f9
Revamp ROCm support
...
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.
We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.
For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
2024-03-07 10:36:50 -08:00
John
23ebe8fe11
fix some typos ( #2973 )
...
Signed-off-by: hishope <csqiye@126.com>
2024-03-06 22:50:11 -08:00
Patrick Devine
2c017ca441
Convert Safetensors to an Ollama model ( #2824 )
2024-03-06 21:01:51 -08:00
Jeffrey Morgan
21347e1ed6
update llama.cpp submodule to c29af7e
( #2868 )
2024-03-01 15:26:04 -08:00
Daniel Hiltgen
bd1d8b0d14
Merge pull request #2836 from bmwiedemann/gzip
...
Omit build date from gzip headers
2024-02-29 15:46:46 -08:00
Jeffrey Morgan
cbf4970e0f
bump submodule to 87c91c07663b707e831c59ec373b5e665ff9d64a
( #2828 )
2024-02-29 09:42:08 -08:00
Bernhard M. Wiedemann
76e5d9ec88
Omit build date from gzip headers
...
See https://reproducible-builds.org/ for why this is good.
This patch was done while working on reproducible builds for openSUSE.
2024-02-29 16:48:19 +01:00
Daniel Hiltgen
061e8f6abc
Bump llama.cpp to b2276
2024-02-26 16:49:24 -08:00
Jeffrey Morgan
11bfff8ee1
update llama.cpp submodule to 96633eeca1265ed03e57230de54032041c58f9cd
2024-02-22 16:44:26 -05:00
Jeffrey Morgan
efe040f8c0
reset with init_vars
ahead of each cpu build in gen_windows.ps1
( #2654 )
2024-02-21 16:35:34 -05:00
Jeffrey Morgan
2a7553ce09
update llama.cpp submodule to c14f72d
2024-02-21 09:03:14 -05:00
Jeffrey Morgan
b3eac61cac
update llama.cpp submodule to f0d1fafc029a056cd765bdae58dcaa12312e9879
2024-02-20 22:56:51 -05:00
Michael Yang
949d7b1c48
add gguf file types ( #2532 )
2024-02-20 19:06:29 -05:00
Jeffrey Morgan
4613a080e7
update llama.cpp submodule to 66c1968f7
( #2618 )
2024-02-20 17:42:31 -05:00
Taras Tsugrii
01ff2e14db
[nit] Remove unused msg local var. ( #2511 )
2024-02-20 14:02:34 -05:00
Daniel Hiltgen
4fcbf1cde6
Merge pull request #2599 from dhiltgen/fix_avx
...
Explicitly disable AVX2 on GPU builds
2024-02-19 13:13:05 -08:00
Daniel Hiltgen
9220b4fa91
Merge pull request #2585 from dhiltgen/cuda_leaks
...
Fix cuda leaks
2024-02-19 12:48:00 -08:00
Daniel Hiltgen
fc39a6cd7a
Fix cuda leaks
...
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
2024-02-18 18:37:20 -08:00
Daniel Hiltgen
df6dc4fd96
Fix duplicate menus on update and exit on signals
...
Also fixes a few fit-and-finish items for better developer experience
2024-02-16 15:33:16 -08:00
Daniel Hiltgen
db2a9ad1fe
Explicitly disable AVX2 on GPU builds
...
Even though we weren't setting it to on, somewhere in the cmake config
it was getting toggled on. By explicitly setting it to off, we get `/arch:AVX`
as intended.
2024-02-15 14:50:11 -08:00
Daniel Hiltgen
29e90cc13b
Implement new Go based Desktop app
...
This focuses on Windows first, but coudl be used for Mac
and possibly linux in the future.
2024-02-15 05:56:45 +00:00
Jeffrey Morgan
9241a29336
Revert "Revert "bump submodule to 6c00a06
( #2479 )"" ( #2485 )
...
This reverts commit 6920964b87
.
2024-02-13 18:18:41 -08:00
Jeffrey Morgan
f7231ad9ad
set shutting_down
to false
once shutdown is complete ( #2484 )
2024-02-13 17:48:41 -08:00
Jeffrey Morgan
6920964b87
Revert "bump submodule to 6c00a06
( #2479 )"
...
This reverts commit 2f9ed52bbd
.
2024-02-13 17:23:05 -08:00
Jeffrey Morgan
2f9ed52bbd
bump submodule to 6c00a06
( #2479 )
2024-02-13 17:12:42 -08:00
Daniel Hiltgen
939c60473f
Merge pull request #2422 from dhiltgen/better_kill
...
More robust shutdown
2024-02-12 14:05:06 -08:00
Jeffrey Morgan
f76ca04f9e
update submodule to 099afc6
( #2468 )
2024-02-12 14:01:16 -08:00
Daniel Hiltgen
76b8728f0c
Merge pull request #2465 from dhiltgen/block_rocm_pre_9
...
Detect AMD GPU info via sysfs and block old cards
2024-02-12 12:41:43 -08:00
Daniel Hiltgen
6d84f07505
Detect AMD GPU info via sysfs and block old cards
...
This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.
2024-02-12 08:19:41 -08:00
Jeffrey Morgan
26b13fc33c
patch: always add token to cache_tokens ( #2459 )
2024-02-12 08:10:16 -08:00
Daniel Hiltgen
6680761596
Shutdown faster
...
Make sure that when a shutdown signal comes, we shutdown quickly instead
of waiting for a potentially long exchange to wrap up.
2024-02-08 22:22:50 -08:00
Daniel Hiltgen
a1dfab43b9
Ensure the libraries are present
...
When we store our libraries in a temp dir, a reaper might clean
them when we are idle, so make sure to check for them before
we reload.
2024-02-07 17:27:49 -08:00
Daniel Hiltgen
de76b95dd4
Bump llama.cpp to b2081
2024-02-06 12:06:43 -08:00
Daniel Hiltgen
27aa2d4a19
Merge pull request #1849 from mraiser/main
...
Accomodate split cuda lib dir
2024-02-05 16:01:16 -08:00
Daniel Hiltgen
e1f50377f4
Harden generate patching model
...
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Jeffrey Morgan
f11bf0740b
use llm.ImageData
2024-01-31 19:13:48 -08:00
Michael Yang
8450bf66e6
trim images
2024-01-31 19:13:47 -08:00
Daniel Hiltgen
72b12c3be7
Bump llama.cpp to b1999
...
This requires an upstream change to support graceful termination,
carried as a patch.
2024-01-30 16:52:12 -08:00
Jeffrey Morgan
2e06ed01d5
remove unknown CPPFLAGS
option
2024-01-28 17:51:23 -08:00
mraiser
4c4c730a0a
Merge branch 'ollama:main' into main
2024-01-27 21:56:11 -05:00
Daniel Hiltgen
e02ecfb6c8
Merge pull request #2116 from dhiltgen/cc_50_80
...
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Jeffrey Morgan
3ebd6a83fc
update submodule to cd4fddb29f81d6a1f6d51a0c016bc6b486d68def
2024-01-25 13:54:11 -08:00
Jeffrey Morgan
a64570dcae
Fix clearing kv cache between requests with the same prompt ( #2186 )
...
* Fix clearing kv cache between requests with the same prompt
* fix powershell script
2024-01-25 13:46:20 -08:00
mraiser
a4564232a4
Update gen_linux.sh to find libcudart in separate directory
2024-01-25 09:49:35 -05:00
Michael Yang
cd22855ef8
refactor tensor read
2024-01-24 10:48:31 -08:00
Jeffrey Morgan
4458efb73a
Load all layers on arm64
macOS if model is small enough ( #2149 )
2024-01-22 17:40:06 -08:00
Daniel Hiltgen
0f5b843319
Refine Accelerate usage on mac
...
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Jeffrey Morgan
ffaf52e1e9
update submodule to 011e8ec577fd135cbc02993d3ea9840c516d6a1c
2024-01-22 15:16:54 -08:00
Daniel Hiltgen
3bc28736cd
Merge pull request #2143 from dhiltgen/llm_verbosity
...
Refine debug logging for llm
2024-01-22 13:19:16 -08:00
Daniel Hiltgen
730dcfcc7a
Refine debug logging for llm
...
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
2024-01-22 12:26:49 -08:00