[中文阅读](./README_zh.md) # HunyuanVideo Latent Feature Extraction Tool This project provides an efficient tool for extracting latent features from videos, preparing them for subsequent video generation and processing tasks. ## Features - Support for various video formats and resolutions - Multi-GPU parallel processing for improved efficiency - Support for multiple aspect ratios - High-performance VAE model for feature extraction - Automatic skipping of already processed videos, supporting resume functionality ## Usage ### 1. Configuration File ## Input dataset Format The input video metadata file (meta_file.list) should be a list of JSON file paths, with each JSON file containing the following fields: The format of meta_file.list (e.g., ./assets/demo/i2v_lora/train_dataset/meta_file.list) is as follows ``` /path/to/0.json /path/to/1.json /path/to/2.json ... ``` `IMPORTANT: Make sure each video's video_id is unique!!!` The format of /path/to/0.json (e.g., ./assets/demo/i2v_lora/train_dataset/meta_data.json) is as follows ```json { "video_path": "/path/to/video.mp4", "raw_caption": { "long caption": "Detailed description text of the video" } } ``` Configure parameters in `hyvideo/hyvae_extract/vae.yaml`: ```yaml vae_path: "./ckpts/hunyuan-video-i2v-720p/vae" # VAE model path video_url_files: "/path/to/meta_file.list" # Video metadata file list output_base_dir: "/path/to/output/directory" # Output directory sample_n_frames: 129 # Number of frames to sample target_size: # Target size - bucket_size - bucket_size enable_multi_aspect_ratio: True # Enable multiple aspect ratios use_stride: True # Use stride sampling ``` #### Bucket Size Reference The `target_size` parameter defines the resolution bucket size. Here are the recommended values for different quality levels: | Quality | Bucket Size | Typical Resolution | |---------|-------------|-------------------| | 720p | 960 | 1280×720 or similar | | 540p | 720 | 960×540 or similar | | 360p | 480 | 640×360 or similar | When `enable_multi_aspect_ratio` is set to `True`, the system will use these bucket sizes as a base to generate multiple aspect ratio buckets. For optimal performance, choose a bucket size that balances quality and memory usage based on your hardware capabilities. ### 2. Run Extraction ```bash # Set environment variables export HOST_GPU_NUM=8 # Set the number of GPUs to use # Run extraction script cd HunyuanVideo-I2V bash hyvideo/hyvae_extract/start.sh ``` ### 3. Single GPU Run ```bash cd HunyuanVideo-I2V export PYTHONPATH=${PYTHONPATH}:`pwd` export HOST_GPU_NUM=1 CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml' ``` ## Output Files The program generates the following files in the specified output directory: 1. `{video_id}.npy` - Latent feature array of the video 2. `json_path/{video_id}.json` - JSON file containing video metadata, including: - video_id: Video ID - latent_shape: Shape of the latent features - video_path: Original video path - prompt: Video description/prompt - npy_save_path: Path where the latent features are saved ``` output_base_dir/ │ ├── {video_id_1}.npy # Latent feature array for video 1 ├── {video_id_2}.npy # Latent feature array for video 2 ├── {video_id_3}.npy # Latent feature array for video 3 │ ... ├── {video_id_n}.npy # Latent feature array for video n │ └── json_path/ # Directory containing metadata JSON files │ ├── {video_id_1}.json # Metadata for video 1 │ ├── {video_id_2}.json # Metadata for video 2 │ ├── {video_id_3}.json # Metadata for video 3 │ │ ... │ └── {video_id_n}.json # Metadata for video n ``` ## Advanced Configuration ### Multiple Aspect Ratio Processing When `enable_multi_aspect_ratio` is set to `True`, the system selects the target size closest to the original aspect ratio of the video, rather than forcing it to be cropped to a fixed size. This is useful for maintaining the integrity of the video content. ### Stride Sampling When `use_stride` is set to `True`, the system automatically adjusts the sampling stride based on the video's frame rate: - When frame rate >= 50fps, stride is 2 - When frame rate < 50fps, stride is 1