readme.md 1.53 KB
Newer Older
mashun1's avatar
v1  
mashun1 committed
1
2
3
4
5
6
# Simple Chat Example

The **chat** endpoint is one of two ways to generate text from an LLM with Ollama, and is introduced in version 0.1.14. At a high level, you provide the endpoint an array of objects with a role and content specified. Then with each output and prompt, you add more of those role/content objects, which builds up the history.

## Running the Example

xuxzh1's avatar
init  
xuxzh1 committed
7
1. Ensure you have the `llama3.1` model installed:
mashun1's avatar
v1  
mashun1 committed
8
9

   ```bash
xuxzh1's avatar
init  
xuxzh1 committed
10
   ollama pull llama3.1
mashun1's avatar
v1  
mashun1 committed
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
   ```

2. Install the Python Requirements.

   ```bash
   pip install -r requirements.txt
   ```

3. Run the example:

   ```bash
   python client.py
   ```

## Review the Code

You can see in the **chat** function that actually calling the endpoint is done simply with:

```python
r = requests.post(
  "http://0.0.0.0:11434/api/chat",
  json={"model": model, "messages": messages, "stream": True},
)
```

With the **generate** endpoint, you need to provide a `prompt`. But with **chat**, you provide `messages`. And the resulting stream of responses includes a `message` object with a `content` field.

The final JSON object doesn't provide the full content, so you will need to build the content yourself.

In the **main** function, we collect `user_input` and add it as a message to our messages and that is passed to the chat function. When the LLM is done responding the output is added as another message.

## Next Steps

In this example, all generations are kept. You might want to experiment with summarizing everything older than 10 conversations to enable longer history with less context being used.