ollama/examples/langchain-python-rag-privategpt/README.md

# PrivateGPT with Llama 2 uncensored

https://github.com/jmorganca/ollama/assets/3325447/20cf8ec6-ff25-42c6-bdd8-9be594e3ce1b

> Note: this example is a slightly modified version of PrivateGPT using models such as Llama 2 Uncensored. All credit for PrivateGPT goes to Iván Martínez who is the creator of it, and you can find his GitHub repo [here](https://github.com/imartinez/privateGPT).

### Setup

Set up a virtual environment (optional):

```
python3 -m venv .venv
source .venv/bin/activate
```

Install the Python dependencies:

```shell
pip install -r requirements.txt
```

Pull the model you'd like to use:

```
ollama pull llama2-uncensored
```

### Getting WeWork's latest quarterly earnings report (10-Q)

```
mkdir source_documents
curl https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf -o source_documents/wework.pdf
```

### Ingesting files

```shell
python ingest.py
```

Output should look like this:

```shell
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00,  1.73s/it]
Loaded 1 new documents from source_documents
Split into 90 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run privateGPT.py to query your documents
```

### Ask questions

```shell
python privateGPT.py

Enter a query: How many locations does WeWork have?

> Answer (took 17.7 s.):
As of June 2023, WeWork has 777 locations worldwide, including 610 Consolidated Locations (as defined in the section entitled Key Performance Indicators).
```

### Try a different model:

```
ollama pull llama2:13b
MODEL=llama2:13b python privateGPT.py
```

## Adding more files

Put any and all your files into the `source_documents` directory

The supported extensions are:

- `.csv`: CSV,
- `.docx`: Word Document,
- `.doc`: Word Document,
- `.enex`: EverNote,
- `.eml`: Email,
- `.epub`: EPub,
- `.html`: HTML File,
- `.md`: Markdown,
- `.msg`: Outlook Message,
- `.odt`: Open Document Text,
- `.pdf`: Portable Document Format (PDF),
- `.pptx` : PowerPoint Document,
- `.ppt` : PowerPoint Document,
- `.txt`: Text file (UTF-8),
add demo video 2023-08-11 15:58:57 +00:00			`# PrivateGPT with Llama 2 uncensored`
add `privategpt` example 2023-08-11 07:18:13 +00:00
add demo video 2023-08-11 15:58:57 +00:00			`https://github.com/jmorganca/ollama/assets/3325447/20cf8ec6-ff25-42c6-bdd8-9be594e3ce1b`

update header note for `privategpt` example 2023-08-11 15:52:26 +00:00			`> Note: this example is a slightly modified version of PrivateGPT using models such as Llama 2 Uncensored. All credit for PrivateGPT goes to Iván Martínez who is the creator of it, and you can find his GitHub repo [here](https://github.com/imartinez/privateGPT).`
add `privategpt` example 2023-08-11 07:18:13 +00:00
			`### Setup`

Update `README.md` for `privategpt` 2023-08-11 15:29:19 +00:00			`Set up a virtual environment (optional):`
add `venv` instructions to `privategpt` example 2023-08-11 07:20:22 +00:00
			```
			`python3 -m venv .venv`
			`source .venv/bin/activate`
			```

more setup instructions for `privategpt` example 2023-08-11 07:19:25 +00:00			`Install the Python dependencies:`

add `privategpt` example 2023-08-11 07:18:13 +00:00			```shell
			`pip install -r requirements.txt`
			```

more setup instructions for `privategpt` example 2023-08-11 07:19:25 +00:00			`Pull the model you'd like to use:`

			```
			`ollama pull llama2-uncensored`
			```

add demo video 2023-08-11 15:58:57 +00:00			`### Getting WeWork's latest quarterly earnings report (10-Q)`
add `privategpt` example 2023-08-11 07:18:13 +00:00
			```
fix `README.md` for `privategpt` example 2023-08-11 07:26:33 +00:00			`mkdir source_documents`
add demo video 2023-08-11 15:58:57 +00:00			`curl https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf -o source_documents/wework.pdf`
add `privategpt` example 2023-08-11 07:18:13 +00:00			```

Update `README.md` for `privategpt` 2023-08-11 15:29:19 +00:00			`### Ingesting files`
add `privategpt` example 2023-08-11 07:18:13 +00:00
			```shell
			`python ingest.py`
			```

			`Output should look like this:`

			```shell
			`Creating new vectorstore`
			`Loading documents from source_documents`
			`Loading new documents: 100%\|██████████████████████\| 1/1 [00:01<00:00, 1.73s/it]`
			`Loaded 1 new documents from source_documents`
			`Split into 90 chunks of text (max. 500 tokens each)`
			`Creating embeddings. May take some minutes...`
			`Using embedded DuckDB with persistence: data will be stored in: db`
			`Ingestion complete! You can now run privateGPT.py to query your documents`
			```

Update `README.md` for `privategpt` 2023-08-11 15:29:19 +00:00			`### Ask questions`
add `privategpt` example 2023-08-11 07:18:13 +00:00
			```shell
			`python privateGPT.py`

			`Enter a query: How many locations does WeWork have?`

			`> Answer (took 17.7 s.):`
			`As of June 2023, WeWork has 777 locations worldwide, including 610 Consolidated Locations (as defined in the section entitled Key Performance Indicators).`
			```

add instructions to `privategpt` example to try another model 2023-08-11 07:23:31 +00:00			`### Try a different model:`

			```
			`ollama pull llama2:13b`
			`MODEL=llama2:13b python privateGPT.py`
			```

Update `README.md` for `privategpt` 2023-08-11 15:29:19 +00:00			`## Adding more files`
add `privategpt` example 2023-08-11 07:18:13 +00:00
			Put any and all your files into the `source_documents` directory

			`The supported extensions are:`

			- `.csv`: CSV,
			- `.docx`: Word Document,
			- `.doc`: Word Document,
			- `.enex`: EverNote,
			- `.eml`: Email,
			- `.epub`: EPub,
			- `.html`: HTML File,
			- `.md`: Markdown,
			- `.msg`: Outlook Message,
			- `.odt`: Open Document Text,
			- `.pdf`: Portable Document Format (PDF),
			- `.pptx` : PowerPoint Document,
			- `.ppt` : PowerPoint Document,
			- `.txt`: Text file (UTF-8),