README.md 4.34 KB
Newer Older
yangzhong's avatar
yangzhong committed
1
# STAR
yangzhong's avatar
yangzhong committed
2

yangzhong's avatar
yangzhong committed
3
4
5
6
7
8
9
10
11
12
## 论文

Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

https://arxiv.org/pdf/2501.02976

## 模型结构

STAR 是一种用于真实世界视频超分辨率的新型框架,旨在提升视频的细节清晰度和时间连贯性。该方法创新性地引入了文本到视频(T2V)扩散模型,以应对传统方法在处理复杂退化(如噪声、模糊)时的局限性。‌提出了一种全新的空间-时间增强框架,显著提升了恢复视频的空间细节与时间连贯性。

yangzhong's avatar
yangzhong committed
13
![image](https://developer.sourcefind.cn/codes/modelzoo/star/-/raw/master/assets/overview.png?inline=false)
yangzhong's avatar
yangzhong committed
14
15
16
17

## 环境配置

### Docker(方法一)
yangzhong's avatar
yangzhong committed
18
19

```
yangzhong's avatar
yangzhong committed
20
在光源可拉取docker镜像:
yangzhong's avatar
yangzhong committed
21
docker pull image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10
yangzhong's avatar
yangzhong committed
22
创建并启动容器:
yangzhong's avatar
yangzhong committed
23
docker run -it --network=host --name=dtk24043_torch23 -v /opt/hyhal:/opt/hyhal:ro -v /usr/local/hyhal:/usr/local/hyhal:ro -v /public:/public:ro --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=128G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 image.sourcefind.cn:5000/dcu/admin/base/pytorch:2.3.0-ubuntu22.04-dtk24.04.3-py3.10
yangzhong's avatar
yangzhong committed
24
25
26
27
28
29
docker exec -it dtk24043_torch23 /bin/bash
安装依赖包:
cd STAR/
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
# 安装环境中缺少的依赖,已有的进行注释,open-clip-torch、diffusers要安装指定版本!!!
sudo apt-get update && sudo apt-get install ffmpeg libsm6 libxext6  -y
yangzhong's avatar
yangzhong committed
30
```
yangzhong's avatar
yangzhong committed
31
### Dockerfile(方法二)
yangzhong's avatar
yangzhong committed
32
33

```
yangzhong's avatar
yangzhong committed
34
35
36
37
38
39
40
docker build --no-cache -t STAR:latest .
docker run -dit --network=host --name=STAR --privileged --device=/dev/kfd --device=/dev/dri --ipc=host --shm-size=16G  --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -u root --ulimit stack=-1:-1 --ulimit memlock=-1:-1 -v /opt/hyhal/:/opt/hyhal/:ro -v /usr/local/hyhal:/usr/local/hyhal:ro STAR:latest
docker exec -it STAR /bin/bash
安装依赖包:
cd STAR/
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
# 安装环境中缺少的依赖,已有的进行注释,open-clip-torch、diffusers要安装指定版本!!!
yangzhong's avatar
yangzhong committed
41
42
43
sudo apt-get update && sudo apt-get install ffmpeg libsm6 libxext6  -y
```

yangzhong's avatar
yangzhong committed
44
### Anaconda(方法三)
yangzhong's avatar
yangzhong committed
45

yangzhong's avatar
yangzhong committed
46
47
48
49
50
51
52
53
54
55
56
```
1.创建conda虚拟环境:
conda create -n STAR python=3.10
2.关于本项目DCU显卡所需的工具包、深度学习库等均可从光合开发者社区下载安装:https://developer.hpccube.com/tool/
DTK驱动:dtk24.04.3
python:python3.10
torch:2.3.0
Tips:以上DTK、python、torch等DCU相关工具包,版本需要严格一一对应,torch2.1或2.3或2.4都可以
3.其它非特殊库参照requirements.txt安装
pip install -r requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
```
yangzhong's avatar
yangzhong committed
57

yangzhong's avatar
yangzhong committed
58
## 测试数据集
yangzhong's avatar
yangzhong committed
59

yangzhong's avatar
yangzhong committed
60
你可以将测试视频放在input/video/中;关于提示词,有三个选项:1. 无提示词。2. 自动生成提示词(例如,使用Pllava)。3. 手动编写提示词。你可以将txt文件放在input/text/中。
yangzhong's avatar
yangzhong committed
61

yangzhong's avatar
yangzhong committed
62
你需要将video_super_resolution/scripts/inference_sr.sh中的路径更改为你本地对应的路径,包括video_folder_path、txt_file_path、model_path和save_dir。
yangzhong's avatar
yangzhong committed
63

yangzhong's avatar
yangzhong committed
64
## 预训练模型
yangzhong's avatar
yangzhong committed
65

yangzhong's avatar
yangzhong committed
66
67
我们为基于I2VGen-XL的模型提供两个版本,heavy_deg.pt适用于重度退化视频,light_deg.pt适用于轻度退化视频(例如从视频网站下载的低分辨率视频)。
模型可以通过Huggingface获取[HuggingFace](https://huggingface.co/SherryX/STAR)
yangzhong's avatar
yangzhong committed
68

yangzhong's avatar
yangzhong committed
69
将权重文件放入pretrained_weight/目录中。
yangzhong's avatar
yangzhong committed
70

yangzhong's avatar
yangzhong committed
71
72
73
74
75
76
77
78
79
本项目提供了Huggingface快速下载脚本,可以运行以下命令将权重文件下载到本地`./pretrained_weight/`目录下

```
python downmodel.py
```

## 推理

#### 单机单卡推理
yangzhong's avatar
yangzhong committed
80
81
82

```
bash video_super_resolution/scripts/inference_sr.sh
yangzhong's avatar
yangzhong committed
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
```

## result



## 精度



### 应用场景

### 算法类别

Text-to-Video

### 热点应用行业

医疗,教育,科研,金融

## 源码仓库及问题反馈

- https://developer.sourcefind.cn/codes/modelzoo/star
## 参考资料
- https://github.com/NJU-PCALab/STAR