docs: Update README
This commit is contained in:
parent
d80c5cf29d
commit
32efed7b07
1 changed files with 91 additions and 33 deletions
124
README.md
124
README.md
|
@ -25,47 +25,82 @@ Documentation is available at [https://llama-cpp-python.readthedocs.io/en/latest
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
`llama-cpp-python` can be installed directly from PyPI as a source distribution by running:
|
Requirements:
|
||||||
|
|
||||||
|
- Python 3.8+
|
||||||
|
- C compiler
|
||||||
|
- Linux: gcc or clang
|
||||||
|
- Windows: Visual Studio or MinGW
|
||||||
|
- MacOS: Xcode
|
||||||
|
|
||||||
|
To install the package, run:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install llama-cpp-python
|
pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
|
||||||
This will build `llama.cpp` from source using cmake and your system's c compiler (required) and install the library alongside this python package.
|
This will also build `llama.cpp` from source and install it alongside this python package.
|
||||||
|
|
||||||
If you run into issues during installation add the `--verbose` flag to the `pip install` command to see the full cmake build log.
|
If this fails, add `--verbose` to the `pip install` see the full cmake build log.
|
||||||
|
|
||||||
### Installation with Specific Hardware Acceleration (BLAS, CUDA, Metal, etc)
|
### Installation Configuration
|
||||||
|
|
||||||
The default pip install behaviour is to build `llama.cpp` for CPU only on Linux and Windows and use Metal on MacOS.
|
`llama.cpp` supports a number of hardware acceleration backends to speed up inference as well as backend specific options. See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list.
|
||||||
|
|
||||||
`llama.cpp` supports a number of hardware acceleration backends depending including OpenBLAS, cuBLAS, CLBlast, HIPBLAS, and Metal.
|
All `llama.cpp` cmake build options can be set via the `CMAKE_ARGS` environment variable or via the `--config-settings / -C` cli flag during installation.
|
||||||
See the [llama.cpp README](https://github.com/ggerganov/llama.cpp#build) for a full list of supported backends.
|
|
||||||
|
|
||||||
All of these backends are supported by `llama-cpp-python` and can be enabled by setting the `CMAKE_ARGS` environment variable before installing.
|
<details>
|
||||||
|
<summary>Environment Variables</summary>
|
||||||
On Linux and Mac you set the `CMAKE_ARGS` like this:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
|
# Linux and Mac
|
||||||
|
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" \
|
||||||
|
pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
|
||||||
On Windows you can set the `CMAKE_ARGS` like this:
|
```powershell
|
||||||
|
# Windows
|
||||||
```ps
|
|
||||||
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
|
$env:CMAKE_ARGS = "-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS"
|
||||||
pip install llama-cpp-python
|
pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
#### OpenBLAS
|
<details>
|
||||||
|
<summary>CLI / requirements.txt</summary>
|
||||||
|
|
||||||
To install with OpenBLAS, set the `LLAMA_BLAS and LLAMA_BLAS_VENDOR` environment variables before installing:
|
They can also be set via `pip install -C / --config-settings` command and saved to a `requirements.txt` file:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install --upgrade pip # ensure pip is up to date
|
||||||
|
pip install llama-cpp-python \
|
||||||
|
-C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"
|
||||||
|
```
|
||||||
|
|
||||||
|
```txt
|
||||||
|
# requirements.txt
|
||||||
|
|
||||||
|
llama-cpp-python -C cmake.args="-DLLAMA_BLAS=ON;-DLLAMA_BLAS_VENDOR=OpenBLAS"
|
||||||
|
```
|
||||||
|
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
|
### Supported Backends
|
||||||
|
|
||||||
|
Below are some common backends, their build commands and any additional environment variables required.
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>OpenBLAS (CPU)</summary>
|
||||||
|
|
||||||
|
To install with OpenBLAS, set the `LLAMA_BLAS` and `LLAMA_BLAS_VENDOR` environment variables before installing:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
#### cuBLAS
|
<details>
|
||||||
|
<summary>cuBLAS (CUDA)</summary>
|
||||||
|
|
||||||
To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing:
|
To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before installing:
|
||||||
|
|
||||||
|
@ -73,7 +108,10 @@ To install with cuBLAS, set the `LLAMA_CUBLAS=on` environment variable before in
|
||||||
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Metal
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Metal</summary>
|
||||||
|
|
||||||
To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable before installing:
|
To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable before installing:
|
||||||
|
|
||||||
|
@ -81,7 +119,10 @@ To install with Metal (MPS), set the `LLAMA_METAL=on` environment variable befor
|
||||||
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
|
||||||
#### CLBlast
|
</details>
|
||||||
|
<details>
|
||||||
|
|
||||||
|
<summary>CLBlast (OpenCL)</summary>
|
||||||
|
|
||||||
To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before installing:
|
To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before installing:
|
||||||
|
|
||||||
|
@ -89,7 +130,10 @@ To install with CLBlast, set the `LLAMA_CLBLAST=on` environment variable before
|
||||||
CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_CLBLAST=on" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
|
||||||
#### hipBLAS
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>hipBLAS (ROCm)</summary>
|
||||||
|
|
||||||
To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on` environment variable before installing:
|
To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on` environment variable before installing:
|
||||||
|
|
||||||
|
@ -97,7 +141,10 @@ To install with hipBLAS / ROCm support for AMD cards, set the `LLAMA_HIPBLAS=on`
|
||||||
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_HIPBLAS=on" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Vulkan
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Vulkan</summary>
|
||||||
|
|
||||||
To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable before installing:
|
To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable before installing:
|
||||||
|
|
||||||
|
@ -105,15 +152,20 @@ To install with Vulkan support, set the `LLAMA_VULKAN=on` environment variable b
|
||||||
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_VULKAN=on" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Kompute
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Kompute</summary>
|
||||||
|
|
||||||
To install with Kompute support, set the `LLAMA_KOMPUTE=on` environment variable before installing:
|
To install with Kompute support, set the `LLAMA_KOMPUTE=on` environment variable before installing:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_KOMPUTE=on" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
#### SYCL
|
<details>
|
||||||
|
<summary>SYCL</summary>
|
||||||
|
|
||||||
To install with SYCL support, set the `LLAMA_SYCL=on` environment variable before installing:
|
To install with SYCL support, set the `LLAMA_SYCL=on` environment variable before installing:
|
||||||
|
|
||||||
|
@ -121,9 +173,14 @@ To install with SYCL support, set the `LLAMA_SYCL=on` environment variable befor
|
||||||
source /opt/intel/oneapi/setvars.sh
|
source /opt/intel/oneapi/setvars.sh
|
||||||
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
|
CMAKE_ARGS="-DLLAMA_SYCL=on -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx" pip install llama-cpp-python
|
||||||
```
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
### Windows Notes
|
### Windows Notes
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Error: Can't find 'nmake' or 'CMAKE_C_COMPILER'</summary>
|
||||||
|
|
||||||
If you run into issues where it complains it can't find `'nmake'` `'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install:
|
If you run into issues where it complains it can't find `'nmake'` `'?'` or CMAKE_C_COMPILER, you can extract w64devkit as [mentioned in llama.cpp repo](https://github.com/ggerganov/llama.cpp#openblas) and add those manually to CMAKE_ARGS before running `pip` install:
|
||||||
|
|
||||||
```ps
|
```ps
|
||||||
|
@ -132,12 +189,14 @@ $env:CMAKE_ARGS = "-DLLAMA_OPENBLAS=on -DCMAKE_C_COMPILER=C:/w64devkit/bin/gcc.e
|
||||||
```
|
```
|
||||||
|
|
||||||
See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to use.
|
See the above instructions and set `CMAKE_ARGS` to the BLAS backend you want to use.
|
||||||
|
</details>
|
||||||
|
|
||||||
### MacOS Notes
|
### MacOS Notes
|
||||||
|
|
||||||
Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/)
|
Detailed MacOS Metal GPU install documentation is available at [docs/install/macos.md](https://llama-cpp-python.readthedocs.io/en/latest/install/macos/)
|
||||||
|
|
||||||
#### M1 Mac Performance Issue
|
<details>
|
||||||
|
<summary>M1 Mac Performance Issue</summary>
|
||||||
|
|
||||||
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
|
Note: If you are using Apple Silicon (M1) Mac, make sure you have installed a version of Python that supports arm64 architecture. For example:
|
||||||
|
|
||||||
|
@ -147,24 +206,21 @@ bash Miniforge3-MacOSX-arm64.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
|
Otherwise, while installing it will build the llama.cpp x86 version which will be 10x slower on Apple Silicon (M1) Mac.
|
||||||
|
</details>
|
||||||
|
|
||||||
#### M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`
|
<details>
|
||||||
|
<summary>M Series Mac Error: `(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))`</summary>
|
||||||
|
|
||||||
Try installing with
|
Try installing with
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
|
CMAKE_ARGS="-DCMAKE_OSX_ARCHITECTURES=arm64 -DCMAKE_APPLE_SILICON_PROCESSOR=arm64 -DLLAMA_METAL=on" pip install --upgrade --verbose --force-reinstall --no-cache-dir llama-cpp-python
|
||||||
```
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
### Upgrading and Reinstalling
|
### Upgrading and Reinstalling
|
||||||
|
|
||||||
To upgrade or rebuild `llama-cpp-python` add the following flags to ensure that the package is rebuilt correctly:
|
To upgrade and rebuild `llama-cpp-python` add `--upgrade --force-reinstall --no-cache-dir` flags to the `pip install` command to ensure the package is rebuilt from source.
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
|
|
||||||
```
|
|
||||||
|
|
||||||
This will ensure that all source files are re-built with the most recently set `CMAKE_ARGS` flags.
|
|
||||||
|
|
||||||
## High-level API
|
## High-level API
|
||||||
|
|
||||||
|
@ -218,13 +274,15 @@ You can pull `Llama` models from Hugging Face using the `from_pretrained` method
|
||||||
You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`).
|
You'll need to install the `huggingface-hub` package to use this feature (`pip install huggingface-hub`).
|
||||||
|
|
||||||
```python
|
```python
|
||||||
llama = Llama.from_pretrained(
|
llm = Llama.from_pretrained(
|
||||||
repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF",
|
repo_id="Qwen/Qwen1.5-0.5B-Chat-GGUF",
|
||||||
filename="*q8_0.gguf",
|
filename="*q8_0.gguf",
|
||||||
verbose=False
|
verbose=False
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
By default the `from_pretrained` method will download the model to the huggingface cache directory so you can manage installed model files with the `huggingface-cli` tool.
|
||||||
|
|
||||||
### Chat Completion
|
### Chat Completion
|
||||||
|
|
||||||
The high-level API also provides a simple interface for chat completion.
|
The high-level API also provides a simple interface for chat completion.
|
||||||
|
|
Loading…
Reference in a new issue