This adds back a check which was lost many releases back to verify /dev/kfd permissions
which when lacking, can lead to confusing failure modes of:
"rocBLAS error: Could not initialize Tensile host: No devices found"
This implementation does not hard fail the serve command but instead will fall back to CPU
with an error log. In the future we can include this in the GPU discovery UX to show
detected but unsupported devices we discovered.
Refine the way we log GPU discovery to improve the non-debug
output, and report more actionable log messages when possible
to help users troubleshoot on their own.
This refines where we extract the LLM libraries to by adding a new
OLLAMA_HOME env var, that defaults to `~/.ollama` The logic was already
idempotenent, so this should speed up startups after the first time a
new release is deployed. It also cleans up after itself.
We now build only a single ROCm version (latest major) on both windows
and linux. Given the large size of ROCms tensor files, we split the
dependency out. It's bundled into the installer on windows, and a
separate download on windows. The linux install script is now smart and
detects the presence of AMD GPUs and looks to see if rocm v6 is already
present, and if not, then downloads our dependency tar file.
For Linux discovery, we now use sysfs and check each GPU against what
ROCm supports so we can degrade to CPU gracefully instead of having
llama.cpp+rocm assert/crash on us. For Windows, we now use go's windows
dynamic library loading logic to access the amdhip64.dll APIs to query
the GPU information.
This reduces the built-in linux version to not use any vector extensions
which enables the resulting builds to run under Rosetta on MacOS in
Docker. Then at runtime it checks for the actual CPU vector
extensions and loads the best CPU library available
* Clean up documentation
Will probably need to update with PRs for new release.
Signed-off-by: Matt Williams <m@technovangelist.com>
* Correcting to fit in 0.1.15 changes
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* addressing comments
Signed-off-by: Matt Williams <m@technovangelist.com>
* more api cleanup
Signed-off-by: Matt Williams <m@technovangelist.com>
* its llava not llama
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Updated hosting to server and documented all env vars
Signed-off-by: Matt Williams <m@technovangelist.com>
* remove last of the cli descriptions
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* update further per conversation with jeff earlier today
Signed-off-by: Matt Williams <m@technovangelist.com>
* cleanup the doc readme
Signed-off-by: Matt Williams <m@technovangelist.com>
* move upgrade to faq
Signed-off-by: Matt Williams <m@technovangelist.com>
* first change
Signed-off-by: Matt Williams <m@technovangelist.com>
* updated
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* examples in parent
Signed-off-by: Matt Williams <m@technovangelist.com>
* add exapmle for create model.
Signed-off-by: Matt Williams <m@technovangelist.com>
* update faq
Signed-off-by: Matt Williams <m@technovangelist.com>
* update create model api
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/api.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* update the readme in docs
Signed-off-by: Matt Williams <m@technovangelist.com>
* update a few more things
Signed-off-by: Matt Williams <m@technovangelist.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/faq.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update README.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/modelfile.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
* Update docs/troubleshooting.md
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
---------
Signed-off-by: Matt Williams <m@technovangelist.com>
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>