.. | ||
llm.py | ||
README.md | ||
requirements.txt |
This is an example of doing LLM inference with Ray and Ray Serve.
First, install the requirements:
$ pip install -r requirements.txt
Deploy a GGUF model to Ray Serve with the following command:
$ serve run llm:llm_builder model_path='../models/mistral-7b-instruct-v0.2.Q4_K_M.gguf'
This will start an API endpoint at http://localhost:8000/
. You can query the model like this:
$ curl -k -d '{"prompt": "tell me a joke", "max_tokens": 128}' -X POST http://localhost:8000