From 23a221999f4864ac57ce1847c0d636beeb81fb30 Mon Sep 17 00:00:00 2001 From: James Braza Date: Mon, 20 Nov 2023 21:24:22 -0800 Subject: [PATCH] Documenting server usage (#768) --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index aee2d8a..5596866 100644 --- a/README.md +++ b/README.md @@ -164,6 +164,7 @@ To install the server package and get started: pip install llama-cpp-python[server] python3 -m llama_cpp.server --model models/7B/llama-model.gguf ``` + Similar to Hardware Acceleration section above, you can also install with GPU (cuBLAS) support like this: ```bash @@ -173,6 +174,8 @@ python3 -m llama_cpp.server --model models/7B/llama-model.gguf --n_gpu_layers 35 Navigate to [http://localhost:8000/docs](http://localhost:8000/docs) to see the OpenAPI documentation. +To bind to `0.0.0.0` to enable remote connections, use `python3 -m llama_cpp.server --host 0.0.0.0`. +Similarly, to change the port (default is 8000), use `--port`. ## Docker image