Daniel Hiltgen
2738837786
Merge pull request #2131 from dhiltgen/probe_cards_at_init
...
Probe GPUs before backend init
2024-01-21 16:13:47 -08:00
Daniel Hiltgen
ec3764538d
Probe GPUs before backend init
...
Detect potential error scenarios so we can fallback to CPU mode without
hitting asserts.
2024-01-21 15:59:38 -08:00
Daniel Hiltgen
df54c723ae
Make CPU builds parallel and customizable AMD GPUs
...
The linux build now support parallel CPU builds to speed things up.
This also exposes AMD GPU targets as an optional setting for advaced
users who want to alter our default set.
2024-01-21 15:12:21 -08:00
Daniel Hiltgen
fa8c990e58
Merge pull request #2127 from dhiltgen/rocm_container
...
Combine the 2 Dockerfiles and add ROCm
2024-01-21 11:49:01 -08:00
Daniel Hiltgen
da72235ebf
Combine the 2 Dockerfiles and add ROCm
...
This renames Dockerfile.build to Dockerfile, and adds some new stages
to support 2 modes of building - the build_linux.sh script uses
intermediate stages to extract the artifacts for ./dist, and the default
build generates a container image usable by both cuda and rocm cards.
This required transitioniing the x86 base to the rocm image to avoid
layer bloat.
2024-01-21 11:37:11 -08:00
Jeffrey Morgan
89c4aee29e
Unlock mutex when failing to load model ( #2117 )
2024-01-20 20:54:46 -05:00
Daniel Hiltgen
a447a083f2
Add compute capability 5.0, 7.5, and 8.0
2024-01-20 14:24:05 -08:00
Jeffrey Morgan
f32ea81b21
increase minimum overhead to 1024MiB ( #2114 )
2024-01-20 17:11:38 -05:00
Daniel Hiltgen
681a914990
Add support for CUDA 5.2 cards
2024-01-20 10:48:43 -08:00
Jeffrey Morgan
4c54f0ddeb
sign dylibs on macOS ( #2101 )
2024-01-19 19:24:11 -05:00
Michael Yang
c08dfaa23d
fix: remove overwritten model layers
...
if create overrides a manifest, first add the older manifest's layers to
the delete map so they can be cleaned up
2024-01-19 14:58:37 -08:00
Daniel Hiltgen
3b76e736ae
Merge pull request #2100 from dhiltgen/more_wsl_globs
...
More WSL paths
2024-01-19 13:41:08 -08:00
Daniel Hiltgen
552db98bf1
More WSL paths
2024-01-19 13:23:29 -08:00
Daniel Hiltgen
fdcdfef620
Merge pull request #2099 from dhiltgen/fix_cuda_model_swap
...
Switch to local dlopen symbols
2024-01-19 12:22:04 -08:00
Daniel Hiltgen
6a042438af
Switch to local dlopen symbols
2024-01-19 11:37:02 -08:00
Jeffrey Morgan
dc88cc3981
use gzip
for runner embedding ( #2067 )
2024-01-19 13:23:03 -05:00
Daniel Hiltgen
62976087c6
Merge pull request #1999 from lainedfles/termux_android_cpu_only
...
Fix CPU-only build under Android Termux enviornment.
2024-01-18 17:16:53 -08:00
Self Denial
344342abdf
Restore dyn_ext_server.c since RTLD_DEEPBIND has been removed
2024-01-18 17:30:42 -07:00
Self Denial
eb76f3e379
Fix CPU-only build under Android Termux enviornment.
...
Update gpu.go initGPUHandles() to declare gpuHandles variable before
reading it. This resolves an "invalid memory address or nil pointer
dereference" error.
Update dyn_ext_server.c to avoid setting the RTLD_DEEPBIND flag under
__TERMUX__ (Android).
2024-01-18 17:25:33 -07:00
Michael Yang
d017e3d0a6
Merge pull request #2060 from jmorganca/mxyng/fix-show
...
fix show handler
2024-01-18 16:02:27 -08:00
Michael Yang
aac9ab4db7
fix show handler
2024-01-18 15:36:50 -08:00
Michael Yang
1f5b7ff976
Merge pull request #1932 from jmorganca/mxyng/api-fields
...
api: add model for all requests
2024-01-18 14:56:51 -08:00
Michael Yang
e299831e2c
Merge pull request #1958 from purificant/ci
...
ci: update setup-go action
2024-01-18 14:53:36 -08:00
Michael Yang
745b5934fa
add model to ModelResponse
2024-01-18 14:32:55 -08:00
Michael Yang
a38d88d828
api: add model for all requests
...
prefer using req.Model and fallback to req.Name
2024-01-18 14:31:37 -08:00
Daniel Hiltgen
abec7f06e5
Merge pull request #2056 from dhiltgen/slog
...
Mechanical switch from log to slog
2024-01-18 14:27:24 -08:00
Michael Yang
e5da190bac
Merge pull request #2020 from jmorganca/mxyng/install-fedora
...
install: pin fedora to max 37
2024-01-18 14:23:42 -08:00
Daniel Hiltgen
ecbfc0182f
Go bump to v1.21 to pick up slog
2024-01-18 14:12:57 -08:00
Daniel Hiltgen
fedd705aea
Mechanical switch from log to slog
...
A few obvious levels were adjusted, but generally everything mapped to "info" level.
2024-01-18 14:12:57 -08:00
Mike Bird
82ee019bfc
add open interpreter to list of extensions ( #2016 )
2024-01-18 13:59:39 -08:00
Sachin Sachdeva
ad9dbc2a04
Haystack Ollama Integration ( #2021 )
...
Updated readme with the web link for haystack ollama integration
2024-01-18 13:38:32 -08:00
Daniel Hiltgen
fccdf4c635
Merge pull request #1987 from xyproto/archlinux
...
Let gpu.go and gen_linux.sh also find CUDA on Arch Linux
2024-01-18 13:32:10 -08:00
Daniel Hiltgen
d450fb1d1e
Merge pull request #2055 from dhiltgen/cuda_docs
...
Refine the linux cuda/rocm developer docs
2024-01-18 12:07:31 -08:00
Daniel Hiltgen
df40b11d03
Merge pull request #2007 from dhiltgen/cpu_fallback
...
Add multiple CPU variants for Intel Mac
2024-01-18 11:32:29 -08:00
Daniel Hiltgen
9cd20b0ec8
Refine the linux cuda/rocm developer docs
2024-01-18 09:44:44 -08:00
Daniel Hiltgen
b992bf65fc
Disable arm64 for test phase
...
The runners are x86 so we can only run binaries that match.
2024-01-17 19:26:13 -08:00
Daniel Hiltgen
1b249748ab
Add multiple CPU variants for Intel Mac
...
This also refines the build process for the ext_server build.
2024-01-17 15:08:54 -08:00
Alexander F. Rødseth
cbe2adc78a
Merge branch 'main' into archlinux
2024-01-17 12:50:11 +01:00
Michael Yang
d5a7353357
Merge pull request #2026 from jmorganca/mxyng/fix-windows
...
fix: normalize name path before splitting
2024-01-16 16:58:42 -08:00
Michael Yang
96cfb62641
fix: normalize name path before splitting
2024-01-16 16:48:29 -08:00
Daniel Hiltgen
7d00b5d110
Merge pull request #1915 from dhiltgen/bump_llama_with_new_dep
...
Bump llama.cpp to b1842 and add new cuda lib dep
2024-01-16 13:36:49 -08:00
Daniel Hiltgen
795674dd90
Bump llama.cpp to b1842 and add new cuda lib dep
...
Upstream llama.cpp has added a new dependency with the
NVIDIA CUDA Driver Libraries (libcuda.so) which is part of the
driver distribution, not the general cuda libraries, and is not
available as an archive, so we can not statically link it. This may
introduce some additional compatibility challenges which we'll
need to keep an eye on.
2024-01-16 12:53:52 -08:00
Daniel Hiltgen
e282bdccdd
Merge pull request #1990 from dhiltgen/ci_mac_cross
...
Add macos cross-compile CI coverage
2024-01-16 12:31:37 -08:00
Michael Yang
d9bfb2f08f
install: pin fedora to max 37
...
repos for fedora 38 and newer do not exist as of this commit
```
$ dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Adding repo from: https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo
Status code: 404 for https://developer.download.nvidia.com/compute/cuda/repos/fedora38/x86_64/cuda-fedora38.repo (IP: 152.195.19.142)
Error: Configuration of repo failed
```
2024-01-16 11:45:21 -08:00
Michael Yang
598d6d5572
Merge pull request #1937 from jmorganca/mxyng/remove-client-py
...
remove client.py
2024-01-16 11:01:41 -08:00
Bruce MacDonald
a897e833b8
do not cache prompt ( #2018 )
...
- prompt cache causes inferance to hang after some time
2024-01-16 13:48:05 -05:00
Patrick Devine
eef50accb4
Fix show parameters ( #2017 )
2024-01-16 10:34:44 -08:00
Michael Yang
05d53de7a1
Merge pull request #1968 from jmorganca/mxyng/fix-request-retry
...
fix: request retry with error
2024-01-16 10:33:50 -08:00
Daniel Hiltgen
8795447dad
Merge pull request #1966 from fpreiss/fpreiss/gen_linux_cuda_detection
...
improve cuda detection (rel. issue #1704 )
2024-01-14 18:00:11 -08:00
Daniel Hiltgen
b3035112a1
Add macos cross-compile CI coverage
2024-01-14 10:38:59 -08:00