README.md 1.5 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
# Fourcastnet_train

## 项目简介



---

## 环境部署

### 1. 拉取镜像

```bash
zhangwenbo's avatar
zhangwenbo committed
14
docker pull image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.1-rc5-rocblas101839-0811-das1.6-py3.10-20250913-rc1
15
16
17
18
19
```

### 2. 创建容器

```bash
zhangwenbo's avatar
zhangwenbo committed
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
docker run -it \    
    --network=host \    
    --hostname=localhost \    
    --name FourCastNet_train \    
    -v /opt/hyhal:/opt/hyhal:ro \    
    -v $PWD:/workspace \    
    --ipc=host \    
    --device=/dev/kfd \    
    --device=/dev/mkfd \    
    --device=/dev/dri \    
    --shm-size=512G \    
    --privileged \    
    --group-add video \    
    --cap-add=SYS_PTRACE \    
    --security-opt seccomp=unconfined \    
    image.sourcefind.cn:5000/dcu/admin/base/vllm:0.9.2-ubuntu22.04-dtk25.04.1-rc5-rocblas101839-0811-das1.6-py3.10-20250913-rc1 \    
    /bin/bash
37
38
39
40
41
42
43
44
45
```

---

## 测试步骤

### 1. 拉取代码

```bash
zhangwenbo's avatar
zhangwenbo committed
46
47
git clone http://developer.sourcefind.cn/codes/bw_bestperf/fourcastnet_train.git
cd fourcastnet_train/
48
49
50
51
52
```

### 2. 安装依赖

```bash
zhangwenbo's avatar
zhangwenbo committed
53
pip install wandb pybind11 h5py ruamel.yaml timm -i https://pypi.tuna.tsinghua.edu.cn/simple
54
55
56
57
58
```

### 3. 下载模型

```bash
zhangwenbo's avatar
zhangwenbo committed
59
60
61
62
63
64
65
66
67
#下载数据集data 包含train和valid数据集
curl -f -C - -o data.tar.gz https://ksefile.hpccube.com:65241/efile/s/d/ZGN1X2Vjbw==/fe9a61dbaca9de37
#下载的文件应按以下层次结构组织
data/
├── global_means.npy
├── global_stds.npy
├── time_means.npy
├── train/
└── valid/
68
69
70
```


zhangwenbo's avatar
zhangwenbo committed
71
## 测试命令
72
73

```bash
zhangwenbo's avatar
zhangwenbo committed
74
bash run.sh
75
76
77
```