README.md 3.65 KB
Newer Older
lim's avatar
lim committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
# Deepseek-V3-0324

Deepseek-V3-0324 is a high-performance, multi-node deployment solution leveraging bf16 precision for deep learning workloads. This project enables efficient training or inference across four machines, optimizing resource utilization and accelerating model execution with bf16 (bfloat16) mixed precision.

## Table of Contents

- [Project Description](#project-description)
- [Installation](#installation)
- [Usage](#usage)
- [Configuration](#configuration)
- [Contributing](#contributing)
- [License](#license)

## Project Description

Deepseek-V3-0324 provides a robust framework to deploy deep learning models across four machines with bf16 precision support. By harnessing the benefits of bf16 arithmetic and distributed computing, it aims to greatly reduce training/inference time while maintaining model accuracy. This system is ideal for researchers and engineers looking to scale their AI workloads efficiently.

## Installation

### Prerequisites

- Python 3.8+
- CUDA-enabled GPU with bf16 support (e.g., NVIDIA A100 or newer)
- NCCL for distributed communication
- Compatible deep learning framework (e.g., PyTorch 2.0+ with bf16 support)
- Access to four machines with network connectivity

### Steps

1. Clone the repository
   ```bash
   git clone https://github.com/your-username/Deepseek-V3-0324.git
   cd Deepseek-V3-0324

2. (Optional) Create and activate a virtual environment
   ```bash
   python -m venv venv
   source venv/bin/activate  # Linux/macOS
   .\venv\Scripts\activate   # Windows
   ```

3. Install required Python packages
   ```bash
   pip install -r requirements.txt
   ```

4. Ensure NCCL and CUDA environments are properly configured on all four machines.

## Usage

### Basic Multi-Machine bf16 Deployment

Run the main training or inference script with appropriate distributed launch commands. For example, using PyTorch's `torch.distributed.launch` tool or `torchrun`:

```bash
torchrun --nnodes=4 --nproc_per_node=1 --rdzv_id=deepseek_v3 --rdzv_backend=c10d --rdzv_endpoint=<master_ip>:29500 main.py --bf16 --config configs/config.yaml
```

### Key Options

- `--bf16`: Enable bf16 precision mode.
- `--config`: Path to YAML configuration file containing experiment parameters.

### Example command

```bash
torchrun --nnodes=4 --nproc_per_node=1 --rdzv_id=deepseek_v3 --rdzv_backend=c10d --rdzv_endpoint=192.168.1.100:29500 main.py --bf16 --config configs/config.yaml
```

Replace `192.168.1.100` with your master node’s IP address.

## Configuration

Deepseek-V3-0324 uses YAML config files for flexible setup. Example configuration parameters include:

```yaml
training:
  batch_size: 64
  epochs: 50
  learning_rate: 0.001
  bf16_enabled: true

distributed:
  backend: nccl
  world_size: 4
  master_addr: "192.168.1.100"
  master_port: 29500

model:
  architecture: resnet50
  pretrained: false
```

Adjust parameters according to your hardware and experiment needs. Place your config file in the `configs/` directory or specify a custom path.

## Contributing

Contributions are warmly welcome! To contribute:

1. Fork the repository
2. Create your feature branch (`git checkout -b feature-name`)
3. Commit your changes (`git commit -m 'Add some feature'`)
4. Push to the branch (`git push origin feature-name`)
5. Open a Pull Request describing your changes

Please ensure your code adheres to PEP 8 style standards and includes appropriate tests where applicable.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.

---

*For questions or support, please open an issue or contact the maintainers.*
```