README.md

# Deploy Marco-o1 API with FastAPI

This example provides an API using FastAPI to interact with a language model. You have the option to choose between using streaming responses or non-streaming responses, depending on your use-case requirements.

## Requirements

- FastAPI
- Uvicorn
- Transformers
- Torch
- VLLM
- HTTPX (for streaming response)
- Requests (optional, for non-streaming)


## Running the API Server

### Non-Streaming Mode

To start the FastAPI server with non-streaming responses:

```bash
uvicorn vllm_fastapi:app --workers 1
```

To run a client with non-streaming responses:

```bash
python3 client.py
```

### Streaming Mode

To start the FastAPI server with non-streaming responses:

```bash
uvicorn stream_vllm_fastapi:app --workers 1
```

To run a client with non-streaming responses:

```bash
python3 stream_client.py
```