Commit e2ec88b4 authored by jerrrrry's avatar jerrrrry
Browse files

Initial commit

parents
Pipeline #3335 canceled with stages
[中文阅读](./README_zh.md)
# HunyuanVideo Latent Feature Extraction Tool
This project provides an efficient tool for extracting latent features from videos, preparing them for subsequent video generation and processing tasks.
## Features
- Support for various video formats and resolutions
- Multi-GPU parallel processing for improved efficiency
- Support for multiple aspect ratios
- High-performance VAE model for feature extraction
- Automatic skipping of already processed videos, supporting resume functionality
## Usage
### 1. Configuration File
## Input dataset Format
The input video metadata file (meta_file.list) should be a list of JSON file paths, with each JSON file containing the following fields:
The format of meta_file.list (e.g., ./assets/demo/i2v_lora/train_dataset/meta_file.list) is as follows
```
/path/to/0.json
/path/to/1.json
/path/to/2.json
...
```
`IMPORTANT: Make sure each video's video_id is unique!!!`
The format of /path/to/0.json (e.g., ./assets/demo/i2v_lora/train_dataset/meta_data.json) is as follows
```json
{
"video_path": "/path/to/video.mp4",
"raw_caption": {
"long caption": "Detailed description text of the video"
}
}
```
Configure parameters in `hyvideo/hyvae_extract/vae.yaml`:
```yaml
vae_path: "./ckpts/hunyuan-video-i2v-720p/vae" # VAE model path
video_url_files: "/path/to/meta_file.list" # Video metadata file list
output_base_dir: "/path/to/output/directory" # Output directory
sample_n_frames: 129 # Number of frames to sample
target_size: # Target size
- bucket_size
- bucket_size
enable_multi_aspect_ratio: True # Enable multiple aspect ratios
use_stride: True # Use stride sampling
```
#### Bucket Size Reference
The `target_size` parameter defines the resolution bucket size. Here are the recommended values for different quality levels:
| Quality | Bucket Size | Typical Resolution |
|---------|-------------|-------------------|
| 720p | 960 | 1280×720 or similar |
| 540p | 720 | 960×540 or similar |
| 360p | 480 | 640×360 or similar |
When `enable_multi_aspect_ratio` is set to `True`, the system will use these bucket sizes as a base to generate multiple aspect ratio buckets. For optimal performance, choose a bucket size that balances quality and memory usage based on your hardware capabilities.
### 2. Run Extraction
```bash
# Set environment variables
export HOST_GPU_NUM=8 # Set the number of GPUs to use
# Run extraction script
cd HunyuanVideo-I2V
bash hyvideo/hyvae_extract/start.sh
```
### 3. Single GPU Run
```bash
cd HunyuanVideo-I2V
export PYTHONPATH=${PYTHONPATH}:`pwd`
export HOST_GPU_NUM=1
CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'
```
## Output Files
The program generates the following files in the specified output directory:
1. `{video_id}.npy` - Latent feature array of the video
2. `json_path/{video_id}.json` - JSON file containing video metadata, including:
- video_id: Video ID
- latent_shape: Shape of the latent features
- video_path: Original video path
- prompt: Video description/prompt
- npy_save_path: Path where the latent features are saved
```
output_base_dir/
├── {video_id_1}.npy # Latent feature array for video 1
├── {video_id_2}.npy # Latent feature array for video 2
├── {video_id_3}.npy # Latent feature array for video 3
│ ...
├── {video_id_n}.npy # Latent feature array for video n
└── json_path/ # Directory containing metadata JSON files
│ ├── {video_id_1}.json # Metadata for video 1
│ ├── {video_id_2}.json # Metadata for video 2
│ ├── {video_id_3}.json # Metadata for video 3
│ │ ...
│ └── {video_id_n}.json # Metadata for video n
```
## Advanced Configuration
### Multiple Aspect Ratio Processing
When `enable_multi_aspect_ratio` is set to `True`, the system selects the target size closest to the original aspect ratio of the video, rather than forcing it to be cropped to a fixed size. This is useful for maintaining the integrity of the video content.
### Stride Sampling
When `use_stride` is set to `True`, the system automatically adjusts the sampling stride based on the video's frame rate:
- When frame rate >= 50fps, stride is 2
- When frame rate < 50fps, stride is 1
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
export PYTHONPATH=${PYTHONPATH}:`pwd`
for ((i=0;i<$HOST_GPU_NUM;++i)); do
CUDA_VISIBLE_DEVICES=$i python3 -u hyvideo/hyvae_extract/run.py --local_rank $i --config 'hyvideo/hyvae_extract/vae.yaml'&
done
# CUDA_VISIBLE_DEVICES=0 python3 -u hyvideo/hyvae_extract/run.py --local_rank 0 --config 'hyvideo/hyvae_extract/vae.yaml'&
wait
echo "Finished."
vae_path: "./ckpts/hunyuan-video-i2v-720p/vae"
video_url_files: "/path/to/meta_file.list"
output_base_dir: "/path/to/output/directory"
sample_n_frames: 129
target_size:
- 480
- 480
enable_multi_aspect_ratio: True
use_stride: True
\ No newline at end of file
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment