Initial commit

e2ec88b4 · jerrrrry · e2ec88b4 · e2ec88b4 · e2ec88b4 · e2ec88b4
Commit e2ec88b4 authored Feb 03, 2026 by jerrrrry
20 changed files
--- a/hyvideo/hyvae_extract/README.md
+++ b/hyvideo/hyvae_extract/README.md
+[中文阅读](./README_zh.md)
+# HunyuanVideo Latent Feature Extraction Tool
+This project provides an efficient tool for extracting latent features from videos, preparing them for subsequent video generation and processing tasks.
+## Features
+- Support for various video formats and resolutions
+- Multi-GPU parallel processing for improved efficiency
+- Support for multiple aspect ratios
+- High-performance VAE model for feature extraction
+- Automatic skipping of already processed videos, supporting resume functionality
+## Usage
+### 1. Configuration File
+## Input dataset Format
+The input video metadata file (meta_file.list) should be a list of JSON file paths, with each JSON file containing the following fields:
+The format of meta_file.list (e.g., ./assets/demo/i2v_lora/train_dataset/meta_file.list) is as follows
+```
+/path/to/0.json
+/path/to/1.json
+/path/to/2.json
+...
+```
+`IMPORTANT: Make sure each video's video_id is unique!!!`
+The format of /path/to/0.json (e.g., ./assets/demo/i2v_lora/train_dataset/meta_data.json) is as follows
+```json
+{
+  "video_path": "/path/to/video.mp4",
+  "raw_caption": {
+    "long caption": "Detailed description text of the video"
+  }
+}
+```
+Configure parameters in `hyvideo/hyvae_extract/vae.yaml`:
+```yaml
+vae_path: "./ckpts/hunyuan-video-i2v-720p/vae" # VAE model path
+video_url_files: "/path/to/meta_file.list"     # Video metadata file list
+output_base_dir: "/path/to/output/directory"   # Output directory
+sample_n_frames: 129                           # Number of frames to sample
+target_size:                                   # Target size
+  - bucket_size
+  - bucket_size
+enable_multi_aspect_ratio: True                # Enable multiple aspect ratios
+use_stride: True                               # Use stride sampling
+```
+#### Bucket Size Reference
+The `target_size` parameter defines the resolution bucket size. Here are the recommended values for different quality levels:
+| Quality | Bucket Size | Typical Resolution |
+|---------|-------------|-------------------|
+| 720p    | 960         | 1280×720 or similar |
+| 540p    | 720         | 960×540 or similar |
+| 360p    | 480         | 640×360 or similar |
+When `enable_multi_aspect_ratio` is set to `True`, the system will use these bucket sizes as a base to generate multiple aspect ratio buckets. For optimal performance, choose a bucket size that balances quality and memory usage based on your hardware capabilities.
+### 2. Run Extraction
+```bash
+# Set environment variables
+export HOST_GPU_NUM=8  # Set the number of GPUs to use
+# Run extraction script
+cd HunyuanVideo-I2V
+bash hyvideo/hyvae_extract/start.sh
+```
+### 3. Single GPU Run
+```bash
+cd HunyuanVideo-I2V
+export PYTHONPATH=${PYTHONPATH}:`pwd`
+export HOST_GPU_NUM=1
+CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'
+```
+## Output Files
+The program generates the following files in the specified output directory:
+1. `{video_id}.npy` - Latent feature array of the video
+2. `json_path/{video_id}.json` - JSON file containing video metadata, including:
+   - video_id: Video ID
+   - latent_shape: Shape of the latent features
+   - video_path: Original video path
+   - prompt: Video description/prompt
+   - npy_save_path: Path where the latent features are saved
+```
+output_base_dir/
+│
+├── {video_id_1}.npy # Latent feature array for video 1
+├── {video_id_2}.npy # Latent feature array for video 2
+├── {video_id_3}.npy # Latent feature array for video 3
+│ ...
+├── {video_id_n}.npy # Latent feature array for video n
+│
+└── json_path/ # Directory containing metadata JSON files
+│     ├── {video_id_1}.json # Metadata for video 1
+│     ├── {video_id_2}.json # Metadata for video 2
+│     ├── {video_id_3}.json # Metadata for video 3
+│     │ ...
+│     └── {video_id_n}.json # Metadata for video n
+```
+## Advanced Configuration
+### Multiple Aspect Ratio Processing
+When `enable_multi_aspect_ratio` is set to `True`, the system selects the target size closest to the original aspect ratio of the video, rather than forcing it to be cropped to a fixed size. This is useful for maintaining the integrity of the video content.
+### Stride Sampling
+When `use_stride` is set to `True`, the system automatically adjusts the sampling stride based on the video's frame rate:
+- When frame rate >= 50fps, stride is 2
+- When frame rate < 50fps, stride is 1
\ No newline at end of file
--- a/hyvideo/hyvae_extract/README_zh.md
+++ b/hyvideo/hyvae_extract/README_zh.md
--- a/hyvideo/hyvae_extract/dataset.py
+++ b/hyvideo/hyvae_extract/dataset.py
--- a/hyvideo/hyvae_extract/run.py
+++ b/hyvideo/hyvae_extract/run.py
--- a/hyvideo/hyvae_extract/start.sh
+++ b/hyvideo/hyvae_extract/start.sh
+export PYTHONPATH=${PYTHONPATH}:`pwd`
+for ((i=0;i<$HOST_GPU_NUM;++i)); do
+    CUDA_VISIBLE_DEVICES=$i python3 -u hyvideo/hyvae_extract/run.py --local_rank $i --config 'hyvideo/hyvae_extract/vae.yaml'&
+done
+# CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'&
+wait
+echo "Finished."
--- a/hyvideo/hyvae_extract/vae.yaml
+++ b/hyvideo/hyvae_extract/vae.yaml
+vae_path: "./ckpts/hunyuan-video-i2v-720p/vae"
+video_url_files: "/path/to/meta_file.list"
+output_base_dir: "/path/to/output/directory"
+sample_n_frames: 129
+target_size: 
+  - 480
+  - 480
+enable_multi_aspect_ratio: True
+use_stride: True
\ No newline at end of file
--- a/hyvideo/inference.py
+++ b/hyvideo/inference.py
--- a/hyvideo/modules/__init__.py
+++ b/hyvideo/modules/__init__.py
--- a/hyvideo/modules/activation_layers.py
+++ b/hyvideo/modules/activation_layers.py
--- a/hyvideo/modules/attenion.py
+++ b/hyvideo/modules/attenion.py
--- a/hyvideo/modules/embed_layers.py
+++ b/hyvideo/modules/embed_layers.py
--- a/hyvideo/modules/fp8_optimization.py
+++ b/hyvideo/modules/fp8_optimization.py
--- a/hyvideo/modules/mlp_layers.py
+++ b/hyvideo/modules/mlp_layers.py
--- a/hyvideo/modules/models.py
+++ b/hyvideo/modules/models.py
--- a/hyvideo/modules/modulate_layers.py
+++ b/hyvideo/modules/modulate_layers.py
--- a/hyvideo/modules/norm_layers.py
+++ b/hyvideo/modules/norm_layers.py
--- a/hyvideo/modules/posemb_layers.py
+++ b/hyvideo/modules/posemb_layers.py
--- a/hyvideo/modules/token_refiner.py
+++ b/hyvideo/modules/token_refiner.py
--- a/hyvideo/text_encoder/__init__.py
+++ b/hyvideo/text_encoder/__init__.py
--- a/hyvideo/utils/__init__.py
+++ b/hyvideo/utils/__init__.py