Commit graph

269 commits

Author SHA1 Message Date
Jeffrey Morgan
f7231ad9ad
set shutting_down to false once shutdown is complete (#2484) 2024-02-13 17:48:41 -08:00
Jeffrey Morgan
6920964b87 Revert "bump submodule to 6c00a06 (#2479)"
This reverts commit 2f9ed52bbd.
2024-02-13 17:23:05 -08:00
Jeffrey Morgan
2f9ed52bbd
bump submodule to 6c00a06 (#2479) 2024-02-13 17:12:42 -08:00
Daniel Hiltgen
939c60473f
Merge pull request #2422 from dhiltgen/better_kill
More robust shutdown
2024-02-12 14:05:06 -08:00
Jeffrey Morgan
f76ca04f9e
update submodule to 099afc6 (#2468) 2024-02-12 14:01:16 -08:00
Daniel Hiltgen
76b8728f0c
Merge pull request #2465 from dhiltgen/block_rocm_pre_9
Detect AMD GPU info via sysfs and block old cards
2024-02-12 12:41:43 -08:00
Daniel Hiltgen
6d84f07505 Detect AMD GPU info via sysfs and block old cards
This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.
2024-02-12 08:19:41 -08:00
Jeffrey Morgan
26b13fc33c
patch: always add token to cache_tokens (#2459) 2024-02-12 08:10:16 -08:00
Daniel Hiltgen
6680761596 Shutdown faster
Make sure that when a shutdown signal comes, we shutdown quickly instead
of waiting for a potentially long exchange to wrap up.
2024-02-08 22:22:50 -08:00
Daniel Hiltgen
a1dfab43b9 Ensure the libraries are present
When we store our libraries in a temp dir, a reaper might clean
them when we are idle, so make sure to check for them before
we reload.
2024-02-07 17:27:49 -08:00
Daniel Hiltgen
de76b95dd4 Bump llama.cpp to b2081 2024-02-06 12:06:43 -08:00
Daniel Hiltgen
27aa2d4a19
Merge pull request #1849 from mraiser/main
Accomodate split cuda lib dir
2024-02-05 16:01:16 -08:00
Daniel Hiltgen
e1f50377f4 Harden generate patching model
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Jeffrey Morgan
f11bf0740b use llm.ImageData 2024-01-31 19:13:48 -08:00
Michael Yang
8450bf66e6 trim images 2024-01-31 19:13:47 -08:00
Daniel Hiltgen
72b12c3be7 Bump llama.cpp to b1999
This requires an upstream change to support graceful termination,
carried as a patch.
2024-01-30 16:52:12 -08:00
Jeffrey Morgan
2e06ed01d5 remove unknown CPPFLAGS option 2024-01-28 17:51:23 -08:00
mraiser
4c4c730a0a
Merge branch 'ollama:main' into main 2024-01-27 21:56:11 -05:00
Daniel Hiltgen
e02ecfb6c8
Merge pull request #2116 from dhiltgen/cc_50_80
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Jeffrey Morgan
3ebd6a83fc update submodule to cd4fddb29f81d6a1f6d51a0c016bc6b486d68def 2024-01-25 13:54:11 -08:00
Jeffrey Morgan
a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
2024-01-25 13:46:20 -08:00
mraiser
a4564232a4
Update gen_linux.sh to find libcudart in separate directory 2024-01-25 09:49:35 -05:00
Michael Yang
cd22855ef8 refactor tensor read 2024-01-24 10:48:31 -08:00
Jeffrey Morgan
4458efb73a
Load all layers on arm64 macOS if model is small enough (#2149) 2024-01-22 17:40:06 -08:00
Daniel Hiltgen
0f5b843319 Refine Accelerate usage on mac
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Jeffrey Morgan
ffaf52e1e9 update submodule to 011e8ec577fd135cbc02993d3ea9840c516d6a1c 2024-01-22 15:16:54 -08:00
Daniel Hiltgen
3bc28736cd
Merge pull request #2143 from dhiltgen/llm_verbosity
Refine debug logging for llm
2024-01-22 13:19:16 -08:00
Daniel Hiltgen
730dcfcc7a Refine debug logging for llm
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
2024-01-22 12:26:49 -08:00
Daniel Hiltgen
27a2d5af54 Debug logging on init failure 2024-01-22 12:08:22 -08:00
Jeffrey Morgan
5f81a33f43
update submodule to 6f9939d (#2115) 2024-01-22 11:56:40 -08:00
Daniel Hiltgen
5576bb2348
Merge pull request #2130 from dhiltgen/more_faster
Make CPU builds parallel and customizable AMD GPUs
2024-01-21 16:14:12 -08:00
Daniel Hiltgen
ec3764538d Probe GPUs before backend init
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
2024-01-21 15:59:38 -08:00
Daniel Hiltgen
df54c723ae Make CPU builds parallel and customizable AMD GPUs
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Jeffrey Morgan
89c4aee29e
Unlock mutex when failing to load model (#2117) 2024-01-20 20:54:46 -05:00
Daniel Hiltgen
a447a083f2 Add compute capability 5.0, 7.5, and 8.0 2024-01-20 14:24:05 -08:00
Daniel Hiltgen
681a914990 Add support for CUDA 5.2 cards 2024-01-20 10:48:43 -08:00
Jeffrey Morgan
4c54f0ddeb
sign dylibs on macOS (#2101) 2024-01-19 19:24:11 -05:00
Daniel Hiltgen
6a042438af Switch to local dlopen symbols 2024-01-19 11:37:02 -08:00
Jeffrey Morgan
dc88cc3981
use gzip for runner embedding (#2067) 2024-01-19 13:23:03 -05:00
Daniel Hiltgen
abec7f06e5
Merge pull request #2056 from dhiltgen/slog
Mechanical switch from log to slog
2024-01-18 14:27:24 -08:00
Daniel Hiltgen
fedd705aea Mechanical switch from log to slog
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Daniel Hiltgen
fccdf4c635
Merge pull request #1987 from xyproto/archlinux
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen
1b249748ab Add multiple CPU variants for Intel Mac
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth
cbe2adc78a
Merge branch 'main' into archlinux 2024-01-17 12:50:11 +01:00
Daniel Hiltgen
795674dd90 Bump llama.cpp to b1842 and add new cuda lib dep
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it.  This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Bruce MacDonald
a897e833b8
do not cache prompt (#2018)
- prompt cache causes inferance to hang after some time
2024-01-16 13:48:05 -05:00
Daniel Hiltgen
8795447dad
Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
improve cuda detection (rel. issue #1704)
2024-01-14 18:00:11 -08:00
Daniel Hiltgen
95ad9a9fc8
Merge pull request #1988 from dhiltgen/fix_intel_mac
Fix typo in arm mac arch script
2024-01-14 08:45:18 -08:00
Daniel Hiltgen
3ca5f69ce8 Fix typo in arm mac arch script 2024-01-14 08:32:57 -08:00
Daniel Hiltgen
cfa6337960
Merge pull request #1982 from dhiltgen/fix_intel_mac
Fix intel mac build
2024-01-14 08:26:46 -08:00