Daniel Hiltgen
9929751cc8
Disable concurrency for AMD + Windows
...
Until ROCm v6.2 ships, we wont be able to get accurate free memory
reporting on windows, which makes automatic concurrency too risky.
Users can still opt-in but will need to pay attention to model sizes otherwise they may thrash/page VRAM or cause OOM crashes.
All other platforms and GPUs have accurate VRAM reporting wired
up now, so we can turn on concurrency by default.
2024-06-21 15:45:05 -07:00
Josh Yan
662568d453
err!=nil check
2024-06-20 09:30:59 -07:00
Josh Yan
4ebb66c662
reformat error check
2024-06-20 09:23:43 -07:00
Josh Yan
23e899f32d
skip os.removeAll() if PID does not exist
2024-06-20 08:51:35 -07:00
Daniel Hiltgen
96624aa412
Merge pull request #5072 from dhiltgen/windows_path
...
Move libraries out of users path
2024-06-19 09:13:39 -07:00
Daniel Hiltgen
10f33b8537
Merge pull request #5146 from dhiltgen/backout
...
Put back temporary intel GPU env var
2024-06-19 09:12:45 -07:00
Daniel Hiltgen
d34d88e417
Revert "Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )""
...
This reverts commit 755b4e4fc2
.
2024-06-19 08:57:41 -07:00
Daniel Hiltgen
52ce350b7a
Fix bad symbol load detection
...
pointer deref's weren't correct on a few libraries, which explains
some crashes on older systems or miswired symlinks for discovery libraries.
2024-06-19 08:39:07 -07:00
Wang,Zhe
badf975e45
get real func ptr.
2024-06-19 09:00:51 +08:00
Wang,Zhe
755b4e4fc2
Revert "gpu: add env var for detecting Intel oneapi gpus ( #5076 )"
...
This reverts commit 163cd3e77c
.
2024-06-19 08:59:58 +08:00
Daniel Hiltgen
b2799f111b
Move libraries out of users path
...
We update the PATH on windows to get the CLI mapped, but this has
an unintended side effect of causing other apps that may use our bundled
DLLs to get terminated when we upgrade.
2024-06-17 13:12:18 -07:00
Lei Jitang
4ad0d4d6d3
Fix a build warning ( #5096 )
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-17 14:47:48 -04:00
Jeffrey Morgan
163cd3e77c
gpu: add env var for detecting Intel oneapi gpus ( #5076 )
...
* gpu: add env var for detecting intel oneapi gpus
* fix build error
2024-06-16 20:09:05 -04:00
Daniel Hiltgen
fd1e6e0590
Add some more debugging logs for intel discovery
...
Also removes an unused overall count variable
2024-06-16 07:42:52 -07:00
Daniel Hiltgen
07d143f412
Merge pull request #5058 from coolljt0725/fix_build_warning
...
gpu: Fix build warning
2024-06-15 11:52:36 -07:00
Daniel Hiltgen
17ce203a26
Merge pull request #4875 from dhiltgen/rocm_gfx900_workaround
...
Rocm gfx900 workaround
2024-06-15 07:38:58 -07:00
Lei Jitang
225f0d1219
gpu: Fix build warning
...
Signed-off-by: Lei Jitang <leijitang@outlook.com>
2024-06-15 14:26:23 +08:00
Daniel Hiltgen
6be309e1bd
Centralize GPU configuration vars
...
This should aid in troubleshooting by capturing and reporting the GPU
settings at startup in the logs along with all the other server settings.
2024-06-14 15:59:10 -07:00
Daniel Hiltgen
da3bf23354
Workaround gfx900 SDMA bugs
...
Implement support for GPU env var workarounds, and leverage
this for the Vega RX 56 which needs
HSA_ENABLE_SDMA=0 set to work properly
2024-06-14 15:38:13 -07:00
Daniel Hiltgen
6f351bf586
review comments and coverage
2024-06-14 14:55:50 -07:00
Daniel Hiltgen
fc37c192ae
Refine CPU load behavior with system memory visibility
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
434dfe30c5
Reintroduce nvidia nvml library for windows
...
This library will give us the most reliable free VRAM reporting on windows
to enable concurrent model scheduling.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
4e2b7e181d
Refactor intel gpu discovery
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
6fd04ca922
Improve multi-gpu handling at the limit
...
Still not complete, needs some refinement to our prediction to understand the
discrete GPUs available space so we can see how many layers fit in each one
since we can't split one layer across multiple GPUs we can't treat free space
as one logical block
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
43ed358f9a
Refine GPU discovery to bootstrap once
...
Now that we call the GPU discovery routines many times to
update memory, this splits initial discovery from free memory
updating.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
b32ebb4f29
Use DRM driver for VRAM info for amd
...
The amdgpu drivers free VRAM reporting omits some other apps, so leverage the
upstream DRM driver which keeps better tabs on things
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
efac488675
Revert "Limit GPU lib search for now ( #4777 )"
...
This reverts commit 476fb8e892
.
2024-06-14 14:51:40 -07:00
Daniel Hiltgen
aac367636d
Actually skip PhysX on windows
2024-06-13 13:17:19 -07:00
Michael Yang
e919f6811f
lint windows
2024-06-04 11:13:30 -07:00
Michael Yang
bf7edb0d5d
lint linux
2024-06-04 11:13:30 -07:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00
Jeffrey Morgan
476fb8e892
Limit GPU lib search for now ( #4777 )
...
* fix oneapi errors on windows 10
2024-06-01 19:24:33 -07:00
Daniel Hiltgen
646371f56d
Merge pull request #3278 from zhewang1-intc/rebase_ollama_main
...
Enabling ollama to run on Intel GPUs with SYCL backend
2024-05-28 16:30:50 -07:00
Patrick Devine
4cc3be3035
Move envconfig and consolidate env vars ( #4608 )
2024-05-24 14:57:15 -07:00
Wang,Zhe
fd5971be0b
support ollama run on Intel GPUs
2024-05-24 11:18:27 +08:00
Daniel Hiltgen
30a7d7096c
Bump VRAM buffer back up
...
Under stress scenarios we're seeing OOMs so this should help stabilize
the allocations under heavy concurrency stress.
2024-05-10 09:15:28 -07:00
Daniel Hiltgen
354ad9254e
Wait for GPU free memory reporting to converge
...
The GPU drivers take a while to update their free memory reporting, so we need
to wait until the values converge with what we're expecting before proceeding
to start another runner in order to get an accurate picture.
2024-05-09 14:56:01 -07:00
Daniel Hiltgen
8727a9c140
Record more GPU information
...
This cleans up the logging for GPU discovery a bit, and can
serve as a foundation to report GPU information in a future UX.
2024-05-09 14:18:14 -07:00
Michael Yang
4736391bfb
llm: add minimum based on layer size
2024-05-06 17:04:19 -07:00
Daniel Hiltgen
b08870aff3
Merge pull request #4188 from dhiltgen/use_our_lib
...
User our bundled libraries (cuda) instead of the host library
2024-05-06 14:41:05 -07:00
Daniel Hiltgen
4cbbf0e13b
Merge pull request #4090 from dhiltgen/rocm_paths
...
Support Fedoras standard ROCm location
2024-05-06 14:33:41 -07:00
Daniel Hiltgen
380378cc80
Use our libraries first
...
Trying to live off the land for cuda libraries was not the right strategy. We need to use the version we compiled against to ensure things work properly
2024-05-06 14:23:29 -07:00
Daniel Hiltgen
af9eb36f9f
Merge pull request #4135 from dhiltgen/no_physx
...
Skip PhysX cudart library
2024-05-06 13:34:00 -07:00
Daniel Hiltgen
06093fd396
Merge pull request #4067 from dhiltgen/cudart
...
Add CUDA Driver API for GPU discovery
2024-05-06 13:30:27 -07:00
Daniel Hiltgen
f56aa20014
Centralize server config handling
...
This moves all the env var reading into one central module
and logs the loaded config once at startup which should
help in troubleshooting user server logs
2024-05-05 16:49:50 -07:00
Daniel Hiltgen
b1ad3a43cb
Skip PhysX cudart library
...
For some reason this library gives incorrect GPU information, so skip it
2024-05-03 11:55:32 -07:00
Daniel Hiltgen
e592e8fccb
Support Fedoras standard ROCm location
2024-05-01 15:47:12 -07:00
Jeffrey Morgan
f0c454ab57
gpu: add 512MiB to darwin minimum, metal doesn't have partial offloading overhead ( #4068 )
2024-05-01 11:46:03 -04:00
Daniel Hiltgen
089daaeabc
Add CUDA Driver API for GPU discovery
...
We're seeing some corner cases with cudart which might be resolved by
switching to the driver API which comes bundled with the driver package
2024-04-30 18:00:45 -07:00
Daniel Hiltgen
7b59d1770f
Fix relative path lookup
2024-04-29 16:00:08 -07:00