Michael Yang
35b89b2eab
rfc: dynamic environ lookup
2024-07-22 11:25:30 -07:00
Daniel Hiltgen
283948c83b
Adjust windows ROCm discovery
...
The v5 hip library returns unsupported GPUs which wont enumerate at
inference time in the runner so this makes sure we align discovery. The
gfx906 cards are no longer supported so we shouldn't compile with that
GPU type as it wont enumerate at runtime.
2024-07-20 15:17:50 -07:00
Jeffrey Morgan
c4cf8ad559
llm: avoid loading model if system memory is too small ( #5637 )
...
* llm: avoid loading model if system memory is too small
* update log
* Instrument swap free space
On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models
* use `systemSwapFreeMemory` in check
---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2024-07-11 16:42:57 -07:00
Daniel Hiltgen
4cfcbc328f
Merge pull request #5124 from dhiltgen/amd_windows
...
Wire up windows AMD driver reporting
2024-07-10 12:50:23 -07:00
Daniel Hiltgen
8ea500441d
Merge pull request #5580 from dhiltgen/cuda_overhead
...
Detect CUDA OS overhead
2024-07-10 12:47:31 -07:00
Daniel Hiltgen
1f50356e8e
Bump ROCm on windows to 6.1.2
...
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen
f6f759fc5f
Detect CUDA OS Overhead
...
This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.
2024-07-09 12:21:50 -07:00
Jeffrey Morgan
f8241bfba3
gpu: report system free memory instead of 0 ( #5521 )
2024-07-06 19:35:04 -04:00
Daniel Hiltgen
ef757da2c9
Better nvidia GPU discovery logging
...
Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.
2024-07-03 10:50:40 -07:00
Daniel Hiltgen
9929751cc8
Disable concurrency for AMD + Windows
...
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00
Josh Yan
662568d453
err!=nil check
2024-06-20 09:30:59 -07:00
Josh Yan
4ebb66c662
reformat error check
2024-06-20 09:23:43 -07:00
Josh Yan
23e899f32d
skip os.removeAll() if PID does not exist
2024-06-20 08:51:35 -07:00
Daniel Hiltgen
96624aa412
Merge pull request #5072 from dhiltgen/windows_path
...
Move libraries out of users path
2024-06-19 09:13:39 -07:00
Daniel Hiltgen
10f33b8537
Merge pull request #5146 from dhiltgen/backout
...
Put back temporary intel GPU env var
2024-06-19 09:12:45 -07:00
Daniel Hiltgen
d34d88e417
Revert "Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )""
...
This reverts commit 755b4e4fc2
.
2024-06-19 08:57:41 -07:00
Daniel Hiltgen
52ce350b7a
Fix bad symbol load detection
...
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
2024-06-19 08:39:07 -07:00
Wang,Zhe
badf975e45
get real func ptr.
2024-06-19 09:00:51 +08:00
Wang,Zhe
755b4e4fc2
Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )"
...
This reverts commit 163cd3e77c
.
2024-06-19 08:59:58 +08:00
Daniel Hiltgen
784bf88b0d
Wire up windows AMD driver reporting
...
This seems to be ROCm version, not actually driver version, but
it may be useful for toggling logic for VRAM reporting in the future
2024-06-18 16:22:47 -07:00
Daniel Hiltgen
b2799f111b
Move libraries out of users path
...
We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.
2024-06-17 13:12:18 -07:00
Lei Jitang
4ad0d4d6d3
Fix a build warning ( #5096 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-17 14:47:48 -04:00
Jeffrey Morgan
163cd3e77c
gpu: add env var for detecting Intel oneapi gpus ( #5076 )
...
* gpu: add env var for detecting intel oneapi gpus
* fix build error
2024-06-16 20:09:05 -04:00
Daniel Hiltgen
fd1e6e0590
Add some more debugging logs for intel discovery
...
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
Daniel Hiltgen
07d143f412
Merge pull request #5058 from coolljt0725/fix_build_warning
...
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
Daniel Hiltgen
17ce203a26
Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
...
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
Lei Jitang
225f0d1219
gpu: Fix build warning
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-15 14:26:23 +08:00
Daniel Hiltgen
6be309e1bd
Centralize GPU configuration vars
...
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
da3bf23354
Workaround gfx900 SDMA bugs
...
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
Daniel Hiltgen
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
fc37c192ae
Refine CPU load behavior with system memory visibility
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
434dfe30c5
Reintroduce nvidia nvml library for windows
...
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
4e2b7e181d
Refactor intel gpu discovery
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
43ed358f9a
Refine GPU discovery to bootstrap once
...
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
b32ebb4f29
Use DRM driver for VRAM info for amd
...
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
efac488675
Revert "Limit GPU lib search for now ( #4777 )"
...
This reverts commit 476fb8e892
.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
aac367636d
Actually skip PhysX on windows
2024-06-13 13:17:19 -07:00
Michael Yang
e919f6811f
lint windows
2024-06-04 11:13:30 -07:00
Michael Yang
bf7edb0d5d
lint linux
2024-06-04 11:13:30 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Jeffrey Morgan
476fb8e892
Limit GPU lib search for now ( #4777 )
...
* fix oneapi errors on windows 10
2024-06-01 19:24:33 -07:00
Daniel Hiltgen
646371f56d
Merge pull request #3278 from zhewang1-intc/rebase_ollama_main
...
Enabling ollama to run on Intel GPUs with SYCL backend
2024-05-28 16:30:50 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Wang,Zhe
fd5971be0b
support ollama run on Intel GPUs
2024-05-24 11:18:27 +08:00
Daniel Hiltgen
30a7d7096c
Bump VRAM buffer back up
...
Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.
2024-05-10 09:15:28 -07:00
Daniel Hiltgen
354ad9254e
Wait for GPU free memory reporting to converge
...
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Daniel Hiltgen
8727a9c140
Record more GPU information
...
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Michael Yang
4736391bfb
llm: add minimum based on layer size
2024-05-06 17:04:19 -07:00
Daniel Hiltgen
b08870aff3
Merge pull request #4188 from dhiltgen/use_our_lib
...
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00