docs: Update README

This commit is contained in:
Andrei Betlen 2024-02-22 03:25:11 -05:00
parent d80c5cf29d
commit 32efed7b07

124
README.md
View file

@ -25,47 +25,82 @@ Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest
## Installation ## Installation
`llama-cpp-python` can be installed directly from PyPI as a source distribution by running: Requirements:
- Python 3.8+
- C compiler
- Linux: gcc or clang
- Windows: Visual Studio or MinGW
- MacOS: Xcode
To install the package, run:
```bash ```bash
pip install llama-cpp-python pip install llama-cpp-python
``` ```
This will build `llama.cpp` from source using cmake and your system's c compiler (required) and install the library alongside this python package. This will also build `llama.cpp` from source and install it alongside this python package.
If you run into issues during installation add the `--verbose` flag to the `pip install` command to see the full cmake build log. If this fails, add `--verbose` to the `pip install` see the full cmake build log.
### Installation with Specific Hardware Acceleration (BLAS, CUDA, Metal, etc) ### Installation Configuration
The default pip install behaviour is to build `llama.cpp` for CPU only on Linux and Windows and use Metal on MacOS. `llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list.
`llama.cpp` supports a number of hardware acceleration backends depending including OpenBLAS, cuBLAS, CLBlast, HIPBLAS, and Metal. All `llama.cpp` cmake build options can be set via the `CMAKE_ARGS` environment variable or via the `--config-settings / -C` cli flag during installation.
See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list of supported backends.
All of these backends are supported by `llama-cpp-python` and can be enabled by setting the `CMAKE_ARGS` environment variable before installing. <details>
<summary>Environment Variables</summary>
On Linux and Mac you set the `CMAKE_ARGS` like this:
```bash ```bash
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python # Linux and Mac
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \
pip install llama-cpp-python
``` ```
On Windows you can set the `CMAKE_ARGS` like this: ```powershell
# Windows
```ps
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" $env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
pip install llama-cpp-python pip install llama-cpp-python
``` ```
</details>
#### OpenBLAS <details>
<summary>CLI / requirements.txt</summary>
To install with OpenBLAS, set the `LLAMA_BLAS and LLAMA_BLAS_VENDOR` environment variables before installing: They can also be set via `pip install -C / --config-settings` command and saved to a `requirements.txt` file:
```bash
pip install --upgrade pip # ensure pip is up to date
pip install llama-cpp-python \
-C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"
```
```txt
# requirements.txt
llama-cpp-python -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"
```
</details>
### Supported Backends
Below are some common backends, their build commands and any additional environment variables required.
<details>
<summary>OpenBLAS (CPU)</summary>
To install with OpenBLAS, set the `LLAMA_BLAS` and `LLAMA_BLAS_VENDOR` environment variables before installing:
```bash ```bash
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
``` ```
</details>
#### cuBLAS <details>
<summary>cuBLAS (CUDA)</summary>
To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing: To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing:
@ -73,7 +108,10 @@ To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before in
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
``` ```
#### Metal </details>
<details>
<summary>Metal</summary>
To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable before installing: To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable before installing:
@ -81,7 +119,10 @@ To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable befor
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
``` ```
#### CLBlast </details>
<details>
<summary>CLBlast (OpenCL)</summary>
To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before installing: To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before installing:
@ -89,7 +130,10 @@ To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before
CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
``` ```
#### hipBLAS </details>
<details>
<summary>hipBLAS (ROCm)</summary>
To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on` environment variable before installing: To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on` environment variable before installing:
@ -97,7 +141,10 @@ To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on`
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
``` ```
#### Vulkan </details>
<details>
<summary>Vulkan</summary>
To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable before installing: To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable before installing:
@ -105,15 +152,20 @@ To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable b
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
``` ```
#### Kompute </details>
<details>
<summary>Kompute</summary>
To install with Kompute support, set the `LLAMA_KOMPUTE=on` environment variable before installing: To install with Kompute support, set the `LLAMA_KOMPUTE=on` environment variable before installing:
```bash ```bash
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
``` ```
</details>
#### SYCL <details>
<summary>SYCL</summary>
To install with SYCL support, set the `LLAMA_SYCL=on` environment variable before installing: To install with SYCL support, set the `LLAMA_SYCL=on` environment variable before installing:
@ -121,9 +173,14 @@ To install with SYCL support, set the `LLAMA_SYCL=on` environment variable befor
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
``` ```
</details>
### Windows Notes ### Windows Notes
<details>
<summary>Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'</summary>
If you run into issues where it complains it can't find `'nmake'` `'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install: If you run into issues where it complains it can't find `'nmake'` `'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install:
```ps ```ps
@ -132,12 +189,14 @@ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.e
``` ```
See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to use. See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to use.
</details>
### MacOS Notes ### MacOS Notes
Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/) Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/)
#### M1 Mac Performance Issue <details>
<summary>M1 Mac Performance Issue</summary>
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example: Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
@ -147,24 +206,21 @@ bash Miniforge3-MacOSX-arm64.sh
``` ```
Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac. Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
</details>
#### M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))` <details>
<summary>M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`</summary>
Try installing with Try installing with
```bash ```bash
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
``` ```
</details>
### Upgrading and Reinstalling ### Upgrading and Reinstalling
To upgrade or rebuild `llama-cpp-python` add the following flags to ensure that the package is rebuilt correctly: To upgrade and rebuild `llama-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source.
```bash
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
```
This will ensure that all source files are re-built with the most recently set `CMAKE_ARGS` flags.
## High-level API ## High-level API
@ -218,13 +274,15 @@ You can pull `Llama` models from Hugging Face using the `from_pretrained` method
You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`). You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`).
```python ```python
llama = Llama.from_pretrained( llm = Llama.from_pretrained(
repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF", repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF",
filename="*q8_0.gguf", filename="*q8_0.gguf",
verbose=False verbose=False
) )
``` ```
By default the `from_pretrained` method will download the model to the huggingface cache directory so you can manage installed model files with the `huggingface-cli` tool.
### Chat Completion ### Chat Completion
The high-level API also provides a simple interface for chat completion. The high-level API also provides a simple interface for chat completion.