2023-08-11 15:58:57 +00:00
# PrivateGPT with Llama 2 uncensored
2023-08-11 07:18:13 +00:00
2023-08-11 15:58:57 +00:00
https://github.com/jmorganca/ollama/assets/3325447/20cf8ec6-ff25-42c6-bdd8-9be594e3ce1b
2023-08-11 15:52:26 +00:00
> Note: this example is a slightly modified version of PrivateGPT using models such as Llama 2 Uncensored. All credit for PrivateGPT goes to Iván Martínez who is the creator of it, and you can find his GitHub repo [here](https://github.com/imartinez/privateGPT).
2023-08-11 07:18:13 +00:00
### Setup
2023-08-11 15:29:19 +00:00
Set up a virtual environment (optional):
2023-08-11 07:20:22 +00:00
```
python3 -m venv .venv
source .venv/bin/activate
```
2023-08-11 07:19:25 +00:00
Install the Python dependencies:
2023-08-11 07:18:13 +00:00
```shell
pip install -r requirements.txt
```
2023-08-11 07:19:25 +00:00
Pull the model you'd like to use:
```
ollama pull llama2-uncensored
```
2023-08-11 15:58:57 +00:00
### Getting WeWork's latest quarterly earnings report (10-Q)
2023-08-11 07:18:13 +00:00
```
2023-08-11 07:26:33 +00:00
mkdir source_documents
2023-08-11 15:58:57 +00:00
curl https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf -o source_documents/wework.pdf
2023-08-11 07:18:13 +00:00
```
2023-08-11 15:29:19 +00:00
### Ingesting files
2023-08-11 07:18:13 +00:00
```shell
python ingest.py
```
Output should look like this:
```shell
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01< 00:00 , 1 . 73s / it ]
Loaded 1 new documents from source_documents
Split into 90 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run privateGPT.py to query your documents
```
2023-08-11 15:29:19 +00:00
### Ask questions
2023-08-11 07:18:13 +00:00
```shell
python privateGPT.py
Enter a query: How many locations does WeWork have?
> Answer (took 17.7 s.):
As of June 2023, WeWork has 777 locations worldwide, including 610 Consolidated Locations (as defined in the section entitled Key Performance Indicators).
```
2023-08-11 07:23:31 +00:00
### Try a different model:
```
ollama pull llama2:13b
MODEL=llama2:13b python privateGPT.py
```
2023-08-11 15:29:19 +00:00
## Adding more files
2023-08-11 07:18:13 +00:00
Put any and all your files into the `source_documents` directory
The supported extensions are:
- `.csv` : CSV,
- `.docx` : Word Document,
- `.doc` : Word Document,
- `.enex` : EverNote,
- `.eml` : Email,
- `.epub` : EPub,
- `.html` : HTML File,
- `.md` : Markdown,
- `.msg` : Outlook Message,
- `.odt` : Open Document Text,
- `.pdf` : Portable Document Format (PDF),
- `.pptx` : PowerPoint Document,
- `.ppt` : PowerPoint Document,
- `.txt` : Text file (UTF-8),