README.md 2.25 KB
Newer Older
Michael Chiang's avatar
Michael Chiang committed
1
# PrivateGPT with Llama 2 uncensored
Jeffrey Morgan's avatar
Jeffrey Morgan committed
2

3
https://github.com/ollama/ollama/assets/3325447/20cf8ec6-ff25-42c6-bdd8-9be594e3ce1b
Michael Chiang's avatar
Michael Chiang committed
4

5
> Note: this example is a slightly modified version of PrivateGPT using models such as Llama 2 Uncensored. All credit for PrivateGPT goes to Iván Martínez who is the creator of it, and you can find his GitHub repo [here](https://github.com/imartinez/privateGPT).
Jeffrey Morgan's avatar
Jeffrey Morgan committed
6
7
8

### Setup

9
Set up a virtual environment (optional):
10
11
12
13
14
15

```
python3 -m venv .venv
source .venv/bin/activate
```

16
17
Install the Python dependencies:

Jeffrey Morgan's avatar
Jeffrey Morgan committed
18
19
20
21
```shell
pip install -r requirements.txt
```

22
23
24
25
26
27
Pull the model you'd like to use:

```
ollama pull llama2-uncensored
```

Michael Chiang's avatar
Michael Chiang committed
28
### Getting WeWork's latest quarterly earnings report (10-Q)
Jeffrey Morgan's avatar
Jeffrey Morgan committed
29
30

```
31
mkdir source_documents
Michael Chiang's avatar
Michael Chiang committed
32
curl https://d18rn0p25nwr6d.cloudfront.net/CIK-0001813756/975b3e9b-268e-4798-a9e4-2a9a7c92dc10.pdf -o source_documents/wework.pdf
Jeffrey Morgan's avatar
Jeffrey Morgan committed
33
34
```

35
### Ingesting files
Jeffrey Morgan's avatar
Jeffrey Morgan committed
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

```shell
python ingest.py
```

Output should look like this:

```shell
Creating new vectorstore
Loading documents from source_documents
Loading new documents: 100%|██████████████████████| 1/1 [00:01<00:00,  1.73s/it]
Loaded 1 new documents from source_documents
Split into 90 chunks of text (max. 500 tokens each)
Creating embeddings. May take some minutes...
Using embedded DuckDB with persistence: data will be stored in: db
Ingestion complete! You can now run privateGPT.py to query your documents
```

54
### Ask questions
Jeffrey Morgan's avatar
Jeffrey Morgan committed
55
56
57
58
59
60
61
62
63
64

```shell
python privateGPT.py

Enter a query: How many locations does WeWork have?

> Answer (took 17.7 s.):
As of June 2023, WeWork has 777 locations worldwide, including 610 Consolidated Locations (as defined in the section entitled Key Performance Indicators).
```

65
66
67
68
69
70
71
### Try a different model:

```
ollama pull llama2:13b
MODEL=llama2:13b python privateGPT.py
```

72
## Adding more files
Jeffrey Morgan's avatar
Jeffrey Morgan committed
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91

Put any and all your files into the `source_documents` directory

The supported extensions are:

- `.csv`: CSV,
- `.docx`: Word Document,
- `.doc`: Word Document,
- `.enex`: EverNote,
- `.eml`: Email,
- `.epub`: EPub,
- `.html`: HTML File,
- `.md`: Markdown,
- `.msg`: Outlook Message,
- `.odt`: Open Document Text,
- `.pdf`: Portable Document Format (PDF),
- `.pptx` : PowerPoint Document,
- `.ppt` : PowerPoint Document,
- `.txt`: Text file (UTF-8),