README.md 849 Bytes
Newer Older
chenzk's avatar
v1.0  
chenzk committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Deploy Marco-o1 API with FastAPI

This example provides an API using FastAPI to interact with a language model. You have the option to choose between using streaming responses or non-streaming responses, depending on your use-case requirements.

## Requirements

- FastAPI
- Uvicorn
- Transformers
- Torch
- VLLM
- HTTPX (for streaming response)
- Requests (optional, for non-streaming)


## Running the API Server

### Non-Streaming Mode

To start the FastAPI server with non-streaming responses:

```bash
uvicorn vllm_fastapi:app --workers 1
```

To run a client with non-streaming responses:

```bash
python3 client.py
```

### Streaming Mode

To start the FastAPI server with non-streaming responses:

```bash
uvicorn stream_vllm_fastapi:app --workers 1
```

To run a client with non-streaming responses:

```bash
python3 stream_client.py
```