update docs (#253)

* update docs * update docs

update docs (#253)
* update docs * update docs
2f0cfa56 · chengtao-lv · GitHub · d91e8d68 · 2f0cfa56
Commit 2f0cfa56 authored Aug 27, 2025 by chengtao-lv Committed by GitHub Aug 27, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 274 additions and 3 deletions

docs/PAPERS_ZH_CN/source/papers/models.md docs/PAPERS_ZH_CN/source/papers/models.md +274 -3

No files found.
--- a/docs/PAPERS_ZH_CN/source/papers/models.md
+++ b/docs/PAPERS_ZH_CN/source/papers/models.md
-#  开源模型

-### VACE: All-in-One Video Creation and Editing
+<div align=center>

-[paper](https://arxiv.org/pdf/2503.07598)
+# Open-Source Models
+
+📢: Collections of Awesome Open-Source Model Resources.
+
+</div>
+
+
+## 📚 <span id="head1"> *Contents* </span>
+
+- Open-Source Models
+  - [Foundation Models](#foundation-models)
+  - [World Models](#world-models)
+
+
+### Foundation Models:
+
+- **Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets**, Technical Report 2023.
+
+  *Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, et al.*
+
+  [[Paper](https://arxiv.org/abs/2311.15127)] [[Code](https://github.com/Stability-AI/generative-models)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/UNet-brown)
+
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{blattmann2023stable,
+    title={Stable video diffusion: Scaling latent video diffusion models to large datasets},
+    author={Blattmann, Andreas and Dockhorn, Tim and Kulal, Sumith and Mendelevitch, Daniel and Kilian, Maciej and Lorenz, Dominik and Levi, Yam and English, Zion and Voleti, Vikram and Letts, Adam and others},
+    journal={arXiv preprint arXiv:2311.15127},
+    year={2023}
+  }
+  ```
+  </details>
+
+- **Wan: Open and Advanced Large-Scale Video Generative Models**, Technical Report 2025.
+
+  *Team Wan, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, et al.*
+
+  [[Paper](https://arxiv.org/abs/2503.20314)] [[Code](https://github.com/Wan-Video/Wan2.1)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/DIT-brown)
+
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{wan2025wan,
+  title={Wan: Open and advanced large-scale video generative models},
+  author={Wan, Team and Wang, Ang and Ai, Baole and Wen, Bin and Mao, Chaojie and Xie, Chen-Wei and Chen, Di and Yu, Feiwu and Zhao, Haiming and Yang, Jianxiao and others},
+  journal={arXiv preprint arXiv:2503.20314},
+  year={2025}
+  }
+  ```
+  </details>
+
+- **HunyuanVideo: A Systematic Framework For Large Video Generation Model**, Technical Report 2024.
+
+  *Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, et al.*
+
+  [[Paper](https://arxiv.org/abs/2412.03603)] [[Code](https://github.com/Tencent-Hunyuan/HunyuanVideo)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/DIT-brown)
+
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{kong2024hunyuanvideo,
+  title={Hunyuanvideo: A systematic framework for large video generative models},
+  author={Kong, Weijie and Tian, Qi and Zhang, Zijian and Min, Rox and Dai, Zuozhuo and Zhou, Jin and Xiong, Jiangfeng and Li, Xin and Wu, Bo and Zhang, Jianwei and others},
+  journal={arXiv preprint arXiv:2412.03603},
+  year={2024}
+  }
+  ```
+  </details>
+
+- **CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer**, ICLR 2025.
+
+  *Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, et al.*
+
+  [[Paper](https://arxiv.org/abs/2408.06072)] [[Code](https://github.com/zai-org/CogVideo)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/DIT-brown)
+
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{yang2024cogvideox,
+    title={Cogvideox: Text-to-video diffusion models with an expert transformer},
+    author={Yang, Zhuoyi and Teng, Jiayan and Zheng, Wendi and Ding, Ming and Huang, Shiyu and Xu, Jiazheng and Yang, Yuanming and Hong, Wenyi and Zhang, Xiaohan and Feng, Guanyu and others},
+    journal={arXiv preprint arXiv:2408.06072},
+    year={2024}
+  }
+  ```
+  </details>
+
+
+- **SkyReels V2: Infinite-Length Film Generative Model**, Technical Report 2025.
+
+  *Guibin Chen, Dixuan Lin, Jiangping Yang, Chunze Lin, Junchen Zhu, Mingyuan Fan, et al.*
+
+  [[Paper](https://arxiv.org/abs/2504.13074)] [[Code](https://github.com/SkyworkAI/SkyReels-V2)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/DIT-brown)
+
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @misc{chen2025skyreelsv2infinitelengthfilmgenerative,
+        title={SkyReels-V2: Infinite-length Film Generative Model},
+        author={Guibin Chen and Dixuan Lin and Jiangping Yang and Chunze Lin and Junchen Zhu and Mingyuan Fan and Hao Zhang and Sheng Chen and Zheng Chen and Chengcheng Ma and Weiming Xiong and Wei Wang and Nuo Pang and Kang Kang and Zhiheng Xu and Yuzhe Jin and Yupeng Liang and Yubing Song and Peng Zhao and Boyuan Xu and Di Qiu and Debang Li and Zhengcong Fei and Yang Li and Yahui Zhou},
+        year={2025},
+        eprint={2504.13074},
+        archivePrefix={arXiv},
+        primaryClass={cs.CV},
+        url={https://arxiv.org/abs/2504.13074},
+  }
+  ```
+  </details>
+
+
+- **Open-Sora: Democratizing Efficient Video Production for All**, Technical Report 2025.
+
+  *Xiangyu Peng, Zangwei Zheng, Chenhui Shen, Tom Young, Xinying Guo, et al.*
+
+  [[Paper](https://arxiv.org/abs/2503.09642v2)] [[Code](https://github.com/hpcaitech/Open-Sora)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/DIT-brown) ![](https://img.shields.io/badge/V2V-orange)
+
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{peng2025open,
+    title={Open-sora 2.0: Training a commercial-level video generation model in $200 k},
+    author={Peng, Xiangyu and Zheng, Zangwei and Shen, Chenhui and Young, Tom and Guo, Xinying and Wang, Binluo and Xu, Hang and Liu, Hongxin and Jiang, Mingyan and Li, Wenjun and others},
+    journal={arXiv preprint arXiv:2503.09642},
+    year={2025}
+  }
+  ```
+  </details>
+
+- **Pyramidal Flow Matching for Efficient Video Generative Modeling**, Technical Report 2024.
+
+  *Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Kun Xu, et al.*
+
+  [[Paper](https://arxiv.org/abs/2410.05954)] [[Code](https://github.com/jy0205/Pyramid-Flow)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/AR-brown)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{jin2024pyramidal,
+  title={Pyramidal flow matching for efficient video generative modeling},
+  author={Jin, Yang and Sun, Zhicheng and Li, Ningyuan and Xu, Kun and Jiang, Hao and Zhuang, Nan and Huang, Quzhe and Song, Yang and Mu, Yadong and Lin, Zhouchen},
+  journal={arXiv preprint arXiv:2410.05954},
+  year={2024}
+  }
+  ```
+  </details>
+
+- **MAGI-1: Autoregressive Video Generation at Scale**, Technical Report 2025.
+
+  *Sand.ai, Hansi Teng, Hongyu Jia, Lei Sun, Lingzhi Li, Maolin Li, Mingqiu Tang, et al.*
+
+  [[Paper](https://arxiv.org/pdf/2505.13211)] [[Code](https://github.com/SandAI-org/Magi-1)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/AR-brown) ![](https://img.shields.io/badge/V2V-orange)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{teng2025magi,
+    title={MAGI-1: Autoregressive Video Generation at Scale},
+    author={Teng, Hansi and Jia, Hongyu and Sun, Lei and Li, Lingzhi and Li, Maolin and Tang, Mingqiu and Han, Shuai and Zhang, Tianning and Zhang, WQ and Luo, Weifeng and others},
+    journal={arXiv preprint arXiv:2505.13211},
+    year={2025}
+  }
+  ```
+  </details>
+
+- **From Slow Bidirectional to Fast Autoregressive Video Diffusion Models**, CVPR 2025.
+
+  *Tianwei Yin, Qiang Zhang, Richard Zhang, William T. Freeman, Fredo Durand, et al.*
+
+  [[Paper](http://arxiv.org/abs/2412.07772)] [[Code](https://github.com/tianweiy/CausVid)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/AR-brown)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @inproceedings{yin2025slow,
+  title={From slow bidirectional to fast autoregressive video diffusion models},
+  author={Yin, Tianwei and Zhang, Qiang and Zhang, Richard and Freeman, William T and Durand, Fredo and Shechtman, Eli and Huang, Xun},
+  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
+  pages={22963--22974},
+  year={2025}
+  }
+  ```
+  </details>
+
+- **Packing Input Frame Context in Next-Frame Prediction Models for Video Generation**, arxiv 2025.
+
+  *Lvmin Zhang, Maneesh Agrawala.*
+
+  [[Paper](https://arxiv.org/abs/2504.12626)] [[Code](https://github.com/lllyasviel/FramePack)] ![](https://img.shields.io/badge/T2V-blue) ![](https://img.shields.io/badge/I2V-green) ![](https://img.shields.io/badge/AR-brown)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{zhang2025packing,
+    title={Packing input frame context in next-frame prediction models for video generation},
+    author={Zhang, Lvmin and Agrawala, Maneesh},
+    journal={arXiv preprint arXiv:2504.12626},
+    year={2025}
+  }
+  ```
+  </details>
+
+### World Models:
+
+- **Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model**, Technical Report 2025.
+
+  *Xianglong He, Chunli Peng, Zexiang Liu, Boyang Wang, Yifan Zhang, et al.*
+
+  [[Paper](https://arxiv.org/abs/2508.13009)] [[Code](https://matrix-game-v2.github.io/)] ![](https://img.shields.io/badge/keyboard-blue) ![](https://img.shields.io/badge/mouse-green) ![](https://img.shields.io/badge/DIT-brown)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{he2025matrix,
+    title={Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model},
+    author={He, Xianglong and Peng, Chunli and Liu, Zexiang and Wang, Boyang and Zhang, Yifan and Cui, Qi and Kang, Fei and Jiang, Biao and An, Mengyin and Ren, Yangyang and others},
+    journal={arXiv preprint arXiv:2508.13009},
+    year={2025}
+  }
+  ```
+  </details>
+
+- **HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels**, Technical Report 2025.
+
+  *HunyuanWorld Team, Zhenwei Wang, Yuhao Liu, Junta Wu, Zixiao Gu, Haoyuan Wang, et al.*
+
+  [[Paper](https://arxiv.org/abs/2507.21809)] [[Code](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0)] ![](https://img.shields.io/badge/image-blue) ![](https://img.shields.io/badge/text-green) ![](https://img.shields.io/badge/DIT-brown)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{team2025hunyuanworld,
+    title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
+    author={Team, HunyuanWorld and Wang, Zhenwei and Liu, Yuhao and Wu, Junta and Gu, Zixiao and Wang, Haoyuan and Zuo, Xuhui and Huang, Tianyu and Li, Wenhuan and Zhang, Sheng and others},
+    journal={arXiv preprint arXiv:2507.21809},
+    year={2025}
+  }
+  ```
+  </details>
+
+- **Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models**, Technical Report 2025.
+
+  *Xuanchi Ren, Yifan Lu, Tianshi Cao, Ruiyuan Gao, Shengyu Huang, Amirmojtaba Sabour, et al.*
+
+  [[Paper](https://arxiv.org/abs/2506.09042)] [[Code](https://research.nvidia.com/labs/toronto-ai/cosmos_drive_dreams)] ![](https://img.shields.io/badge/drive-blue) ![](https://img.shields.io/badge/DIT-brown)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{ren2025cosmos,
+    title={Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models},
+    author={Ren, Xuanchi and Lu, Yifan and Cao, Tianshi and Gao, Ruiyuan and Huang, Shengyu and Sabour, Amirmojtaba and Shen, Tianchang and Pfaff, Tobias and Wu, Jay Zhangjie and Chen, Runjian and others},
+    journal={arXiv preprint arXiv:2506.09042},
+    year={2025}
+  }
+  ```
+  </details>
+
+- **Genie 3: A new frontier for world models**, Blog 2025.
+
+  *Google DeepMind*
+
+  [[Blog](https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/)] ![](https://img.shields.io/badge/event-blue) ![](https://img.shields.io/badge/DIT-brown)
+
+- **GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving.**, Technical Report 2025.
+
+  *Lloyd Russell, Anthony Hu, Lorenzo Bertoni, George Fedoseev, Jamie Shotton, et al.*
+
+  [[Paper](https://arxiv.org/abs/2503.20523)] [[Code](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0)] ![](https://img.shields.io/badge/drive-blue) ![](https://img.shields.io/badge/transformer-brown)
+  <details> <summary>BibTex</summary>
+
+  ```text
+  @article{russell2025gaia,
+    title={Gaia-2: A controllable multi-view generative world model for autonomous driving},
+    author={Russell, Lloyd and Hu, Anthony and Bertoni, Lorenzo and Fedoseev, George and Shotton, Jamie and Arani, Elahe and Corrado, Gianluca},
+    journal={arXiv preprint arXiv:2503.20523},
+    year={2025}
+  }
+  ```
+  </details>