Commit graph

284 commits

Author SHA1 Message Date
Daniel Hiltgen
061e8f6abc Bump llama.cpp to b2276 2024-02-26 16:49:24 -08:00
Jeffrey Morgan
11bfff8ee1 update llama.cpp submodule to 96633eeca1265ed03e57230de54032041c58f9cd 2024-02-22 16:44:26 -05:00
Jeffrey Morgan
efe040f8c0
reset with init_vars ahead of each cpu build in gen_windows.ps1 (#2654) 2024-02-21 16:35:34 -05:00
Jeffrey Morgan
2a7553ce09 update llama.cpp submodule to c14f72d 2024-02-21 09:03:14 -05:00
Jeffrey Morgan
b3eac61cac update llama.cpp submodule to f0d1fafc029a056cd765bdae58dcaa12312e9879 2024-02-20 22:56:51 -05:00
Michael Yang
949d7b1c48
add gguf file types (#2532) 2024-02-20 19:06:29 -05:00
Jeffrey Morgan
4613a080e7
update llama.cpp submodule to 66c1968f7 (#2618) 2024-02-20 17:42:31 -05:00
Taras Tsugrii
01ff2e14db
[nit] Remove unused msg local var. (#2511) 2024-02-20 14:02:34 -05:00
Daniel Hiltgen
4fcbf1cde6
Merge pull request #2599 from dhiltgen/fix_avx
Explicitly disable AVX2 on GPU builds
2024-02-19 13:13:05 -08:00
Daniel Hiltgen
9220b4fa91
Merge pull request #2585 from dhiltgen/cuda_leaks
Fix cuda leaks
2024-02-19 12:48:00 -08:00
Daniel Hiltgen
fc39a6cd7a Fix cuda leaks
This should resolve the problem where we don't fully unload from the GPU
when we go idle.
2024-02-18 18:37:20 -08:00
Daniel Hiltgen
df6dc4fd96 Fix duplicate menus on update and exit on signals
Also fixes a few fit-and-finish items for better developer experience
2024-02-16 15:33:16 -08:00
Daniel Hiltgen
db2a9ad1fe Explicitly disable AVX2 on GPU builds
Even though we weren't setting it to on, somewhere in the cmake config
it was getting toggled on.  By explicitly setting it to off, we get `/arch:AVX`
as intended.
2024-02-15 14:50:11 -08:00
Daniel Hiltgen
29e90cc13b Implement new Go based Desktop app
This focuses on Windows first, but coudl be used for Mac
and possibly linux in the future.
2024-02-15 05:56:45 +00:00
Jeffrey Morgan
9241a29336
Revert "Revert "bump submodule to 6c00a06 (#2479)"" (#2485)
This reverts commit 6920964b87.
2024-02-13 18:18:41 -08:00
Jeffrey Morgan
f7231ad9ad
set shutting_down to false once shutdown is complete (#2484) 2024-02-13 17:48:41 -08:00
Jeffrey Morgan
6920964b87 Revert "bump submodule to 6c00a06 (#2479)"
This reverts commit 2f9ed52bbd.
2024-02-13 17:23:05 -08:00
Jeffrey Morgan
2f9ed52bbd
bump submodule to 6c00a06 (#2479) 2024-02-13 17:12:42 -08:00
Daniel Hiltgen
939c60473f
Merge pull request #2422 from dhiltgen/better_kill
More robust shutdown
2024-02-12 14:05:06 -08:00
Jeffrey Morgan
f76ca04f9e
update submodule to 099afc6 (#2468) 2024-02-12 14:01:16 -08:00
Daniel Hiltgen
76b8728f0c
Merge pull request #2465 from dhiltgen/block_rocm_pre_9
Detect AMD GPU info via sysfs and block old cards
2024-02-12 12:41:43 -08:00
Daniel Hiltgen
6d84f07505 Detect AMD GPU info via sysfs and block old cards
This wires up some new logic to start using sysfs to discover AMD GPU
information and detects old cards we can't yet support so we can fallback to CPU mode.
2024-02-12 08:19:41 -08:00
Jeffrey Morgan
26b13fc33c
patch: always add token to cache_tokens (#2459) 2024-02-12 08:10:16 -08:00
Daniel Hiltgen
6680761596 Shutdown faster
Make sure that when a shutdown signal comes, we shutdown quickly instead
of waiting for a potentially long exchange to wrap up.
2024-02-08 22:22:50 -08:00
Daniel Hiltgen
a1dfab43b9 Ensure the libraries are present
When we store our libraries in a temp dir, a reaper might clean
them when we are idle, so make sure to check for them before
we reload.
2024-02-07 17:27:49 -08:00
Daniel Hiltgen
de76b95dd4 Bump llama.cpp to b2081 2024-02-06 12:06:43 -08:00
Daniel Hiltgen
27aa2d4a19
Merge pull request #1849 from mraiser/main
Accomodate split cuda lib dir
2024-02-05 16:01:16 -08:00
Daniel Hiltgen
e1f50377f4 Harden generate patching model
Only apply patches if we have any, and make sure to cleanup
every file we patched at the end to leave the tree clean
2024-02-01 19:34:36 -08:00
Jeffrey Morgan
f11bf0740b use llm.ImageData 2024-01-31 19:13:48 -08:00
Michael Yang
8450bf66e6 trim images 2024-01-31 19:13:47 -08:00
Daniel Hiltgen
72b12c3be7 Bump llama.cpp to b1999
This requires an upstream change to support graceful termination,
carried as a patch.
2024-01-30 16:52:12 -08:00
Jeffrey Morgan
2e06ed01d5 remove unknown CPPFLAGS option 2024-01-28 17:51:23 -08:00
mraiser
4c4c730a0a
Merge branch 'ollama:main' into main 2024-01-27 21:56:11 -05:00
Daniel Hiltgen
e02ecfb6c8
Merge pull request #2116 from dhiltgen/cc_50_80
Add support for CUDA 5.0 cards
2024-01-27 10:28:38 -08:00
Jeffrey Morgan
3ebd6a83fc update submodule to cd4fddb29f81d6a1f6d51a0c016bc6b486d68def 2024-01-25 13:54:11 -08:00
Jeffrey Morgan
a64570dcae
Fix clearing kv cache between requests with the same prompt (#2186)
* Fix clearing kv cache between requests with the same prompt

* fix powershell script
2024-01-25 13:46:20 -08:00
mraiser
a4564232a4
Update gen_linux.sh to find libcudart in separate directory 2024-01-25 09:49:35 -05:00
Michael Yang
cd22855ef8 refactor tensor read 2024-01-24 10:48:31 -08:00
Jeffrey Morgan
4458efb73a
Load all layers on arm64 macOS if model is small enough (#2149) 2024-01-22 17:40:06 -08:00
Daniel Hiltgen
0f5b843319 Refine Accelerate usage on mac
For old macs, accelerate seems to cause crashes, but for
AVX2 capable macs, it does not.
2024-01-22 16:25:56 -08:00
Jeffrey Morgan
ffaf52e1e9 update submodule to 011e8ec577fd135cbc02993d3ea9840c516d6a1c 2024-01-22 15:16:54 -08:00
Daniel Hiltgen
3bc28736cd
Merge pull request #2143 from dhiltgen/llm_verbosity
Refine debug logging for llm
2024-01-22 13:19:16 -08:00
Daniel Hiltgen
730dcfcc7a Refine debug logging for llm
This wires up logging in llama.cpp to always go to stderr, and also
turns up logging if OLLAMA_DEBUG is set.
2024-01-22 12:26:49 -08:00
Daniel Hiltgen
27a2d5af54 Debug logging on init failure 2024-01-22 12:08:22 -08:00
Jeffrey Morgan
5f81a33f43
update submodule to 6f9939d (#2115) 2024-01-22 11:56:40 -08:00
Daniel Hiltgen
5576bb2348
Merge pull request #2130 from dhiltgen/more_faster
Make CPU builds parallel and customizable AMD GPUs
2024-01-21 16:14:12 -08:00
Daniel Hiltgen
ec3764538d Probe GPUs before backend init
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
2024-01-21 15:59:38 -08:00
Daniel Hiltgen
df54c723ae Make CPU builds parallel and customizable AMD GPUs
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Jeffrey Morgan
89c4aee29e
Unlock mutex when failing to load model (#2117) 2024-01-20 20:54:46 -05:00
Daniel Hiltgen
a447a083f2 Add compute capability 5.0, 7.5, and 8.0 2024-01-20 14:24:05 -08:00