From 32efed7b07bcf4d451485283cee4e21bc557e686 Mon Sep 17 00:00:00 2001 From: Andrei Betlen Date: Thu, 22 Feb 2024 03:25:11 -0500 Subject: [PATCH] docs: Update README --- README.md | 124 +++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 91 insertions(+), 33 deletions(-) diff --git a/README.md b/README.md index 05706dc..5d8bbc5 100644 --- a/README.md +++ b/README.md @@ -25,47 +25,82 @@ Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest ## Installation -`llama-cpp-python` can be installed directly from PyPI as a source distribution by running: +Requirements: + + - Python 3.8+ + - C compiler + - Linux: gcc or clang + - Windows: Visual Studio or MinGW + - MacOS: Xcode + +To install the package, run: ```bash pip install llama-cpp-python ``` -This will build `llama.cpp` from source using cmake and your system's c compiler (required) and install the library alongside this python package. +This will also build `llama.cpp` from source and install it alongside this python package. -If you run into issues during installation add the `--verbose` flag to the `pip install` command to see the full cmake build log. +If this fails, add `--verbose` to the `pip install` see the full cmake build log. -### Installation with Specific Hardware Acceleration (BLAS, CUDA, Metal, etc) +### Installation Configuration -The default pip install behaviour is to build `llama.cpp` for CPU only on Linux and Windows and use Metal on MacOS. +`llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list. -`llama.cpp` supports a number of hardware acceleration backends depending including OpenBLAS, cuBLAS, CLBlast, HIPBLAS, and Metal. -See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list of supported backends. +All `llama.cpp` cmake build options can be set via the `CMAKE_ARGS` environment variable or via the `--config-settings / -C` cli flag during installation. -All of these backends are supported by `llama-cpp-python` and can be enabled by setting the `CMAKE_ARGS` environment variable before installing. - -On Linux and Mac you set the `CMAKE_ARGS` like this: +
+Environment Variables ```bash -CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python +# Linux and Mac +CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \ + pip install llama-cpp-python ``` -On Windows you can set the `CMAKE_ARGS` like this: - -```ps +```powershell +# Windows $env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python ``` +
-#### OpenBLAS +
+CLI / requirements.txt -To install with OpenBLAS, set the `LLAMA_BLAS and LLAMA_BLAS_VENDOR` environment variables before installing: +They can also be set via `pip install -C / --config-settings` command and saved to a `requirements.txt` file: + +```bash +pip install --upgrade pip # ensure pip is up to date +pip install llama-cpp-python \ + -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS" +``` + +```txt +# requirements.txt + +llama-cpp-python -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS" +``` + +
+ + +### Supported Backends + +Below are some common backends, their build commands and any additional environment variables required. + +
+OpenBLAS (CPU) + +To install with OpenBLAS, set the `LLAMA_BLAS` and `LLAMA_BLAS_VENDOR` environment variables before installing: ```bash CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python ``` +
-#### cuBLAS +
+cuBLAS (CUDA) To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing: @@ -73,7 +108,10 @@ To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before in CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python ``` -#### Metal +
+ +
+Metal To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable before installing: @@ -81,7 +119,10 @@ To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable befor CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python ``` -#### CLBlast +
+
+ +CLBlast (OpenCL) To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before installing: @@ -89,7 +130,10 @@ To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python ``` -#### hipBLAS +
+ +
+hipBLAS (ROCm) To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on` environment variable before installing: @@ -97,7 +141,10 @@ To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on` CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python ``` -#### Vulkan +
+ +
+Vulkan To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable before installing: @@ -105,15 +152,20 @@ To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable b CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python ``` -#### Kompute +
+ +
+Kompute To install with Kompute support, set the `LLAMA_KOMPUTE=on` environment variable before installing: ```bash CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python ``` +
-#### SYCL +
+SYCL To install with SYCL support, set the `LLAMA_SYCL=on` environment variable before installing: @@ -121,9 +173,14 @@ To install with SYCL support, set the `LLAMA_SYCL=on` environment variable befor source /opt/intel/oneapi/setvars.sh CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python ``` +
+ ### Windows Notes +
+Error: Can't find 'nmake' or 'CMAKE_C_COMPILER' + If you run into issues where it complains it can't find `'nmake'` `'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install: ```ps @@ -132,12 +189,14 @@ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.e ``` See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to use. +
### MacOS Notes Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/) -#### M1 Mac Performance Issue +
+M1 Mac Performance Issue Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example: @@ -147,24 +206,21 @@ bash Miniforge3-MacOSX-arm64.sh ``` Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac. +
-#### M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))` +
+M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))` Try installing with ```bash CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python ``` +
### Upgrading and Reinstalling -To upgrade or rebuild `llama-cpp-python` add the following flags to ensure that the package is rebuilt correctly: - -```bash -pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir -``` - -This will ensure that all source files are re-built with the most recently set `CMAKE_ARGS` flags. +To upgrade and rebuild `llama-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source. ## High-level API @@ -218,13 +274,15 @@ You can pull `Llama` models from Hugging Face using the `from_pretrained` method You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`). ```python -llama = Llama.from_pretrained( +llm = Llama.from_pretrained( repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF", filename="*q8_0.gguf", verbose=False ) ``` +By default the `from_pretrained` method will download the model to the huggingface cache directory so you can manage installed model files with the `huggingface-cli` tool. + ### Chat Completion The high-level API also provides a simple interface for chat completion.