update seko (#262)

* update seko talk * update * Update configs * update --------- Co-authored-by: gushiqiao <975033167@qq.com>

update seko (#262)
* update seko talk * update * Update configs * update --------- Co-authored-by: gushiqiao <975033167@qq.com>
a8aea27f · Yang Yong(雍洋) · GitHub · aaf5f643 · a8aea27f · a8aea27f
Commit a8aea27f authored Aug 29, 2025 by Yang Yong(雍洋) Committed by GitHub Aug 29, 2025
8 changed files
--- a/scripts/seko_talk/run_seko_talk_06_offload_fp8_H100.sh
+++ b/scripts/seko_talk/run_seko_talk_06_offload_fp8_H100.sh
+#!/bin/bash
+lightx2v_path=/path/to/Lightx2v
+model_path=/path/to/SekoTalk-Distill-fp8
+export CUDA_VISIBLE_DEVICES=0
+# set environment variables
+source ${lightx2v_path}/scripts/base/base.sh
+export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
+export ENABLE_GRAPH_MODE=false
+export SENSITIVE_LAYER_DTYPE=None
+python -m lightx2v.infer \
+--model_cls seko_talk \
+--task i2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/seko_talk/seko_talk_06_offload_fp8_H100.json \
+--prompt  "The video features a old lady is saying something and knitting a sweater." \
+--negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
+--image_path ${lightx2v_path}/assets/inputs/audio/15.png \
+--audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_seko_talk.mp4
--- a/scripts/seko_talk/run_seko_talk_07_dist_offload.sh
+++ b/scripts/seko_talk/run_seko_talk_07_dist_offload.sh
+#!/bin/bash
+lightx2v_path=/path/to/Lightx2v
+model_path=/path/to/SekoTalk-Distill
+export CUDA_VISIBLE_DEVICES=0,1,2,3
+# set environment variables
+source ${lightx2v_path}/scripts/base/base.sh
+export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
+export ENABLE_GRAPH_MODE=false
+export SENSITIVE_LAYER_DTYPE=None
+torchrun --nproc-per-node 4 -m lightx2v.infer \
+--model_cls seko_talk \
+--task i2v \
+--model_path $model_path \
+--config_json ${lightx2v_path}/configs/seko_talk/seko_talk_07_dist_offload.json \
+--prompt  "The video features a old lady is saying something and knitting a sweater." \
+--negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
+--image_path ${lightx2v_path}/assets/inputs/audio/15.png \
+--audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_seko_talk.mp4
--- a/scripts/seko_audio_driven/run_wan_i2v_audio_quant.sh
+++ b/scripts/seko_audio_driven/run_wan_i2v_audio_quant.sh
 #!/bin/bash
-lightx2v_path=
+lightx2v_path=/path/to/Lightx2v
-model_path=
+model_path=/path/to/SekoTalk-Distill-5B
 export CUDA_VISIBLE_DEVICES=0
@@ -14,12 +14,12 @@ export ENABLE_GRAPH_MODE=false
 export SENSITIVE_LAYER_DTYPE=None
 python -m lightx2v.infer \
--model_cls wan2.1_audio \
+--model_cls seko_talk \
 --task i2v \
 --model_path $model_path \
--config_json ${lightx2v_path}/configs/audio_driven/wan_i2v_audio_quant.json \
+--config_json ${lightx2v_path}/configs/seko_talk/seko_talk_08_5B_base.json \
 --prompt  "The video features a old lady is saying something and knitting a sweater." \
 --negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
 --image_path ${lightx2v_path}/assets/inputs/audio/15.png \
 --audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_audio.mp4
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_seko_talk.mp4
--- a/scripts/wan/run_wan_i2v_audio_dist.sh
+++ b/scripts/wan/run_wan_i2v_audio_dist.sh
-#!/bin/bash
-# set path and first
-lightx2v_path=
-model_path=
-# set environment variables
-source ${lightx2v_path}/scripts/base/base.sh
-export CUDA_VISIBLE_DEVICES=0,1,2,3
-export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
-export ENABLE_GRAPH_MODE=false
-export SENSITIVE_LAYER_DTYPE=None
-#for debugging
-#export TORCH_NCCL_BLOCKING_WAIT=1 #启用 NCCL 阻塞等待模式（否则 watchdog 会杀死卡顿的进程）
-#export NCCL_BLOCKING_WAIT_TIMEOUT=1800 #设置 watchdog 的等待超时
-torchrun --nproc-per-node 4 -m lightx2v.infer \
--model_cls wan2.1_audio \
--task i2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/audio_driven/wan_i2v_audio_dist.json \
--prompt  "The video features a old lady is saying something and knitting a sweater." \
--negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
--image_path ${lightx2v_path}/assets/inputs/audio/15.png \
--audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_audio.mp4
--- a/scripts/wan/run_wan_i2v_audio_lazy_load.sh
+++ b/scripts/wan/run_wan_i2v_audio_lazy_load.sh
-#!/bin/bash
-# set path and first
-lightx2v_path=
-model_path=
-export CUDA_VISIBLE_DEVICES=0
-export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
-# set environment variables
-source ${lightx2v_path}/scripts/base/base.sh
-export DTYPE=FP16
-export SENSITIVE_LAYER_DTYPE=FP16
-export ENABLE_PROFILING_DEBUG=true
-export ENABLE_GRAPH_MODE=false
-echo "==============================================================================="
-echo "LightX2V Lazyload Environment Variables Summary:"
-echo "-------------------------------------------------------------------------------"
-echo "lightx2v_path: ${lightx2v_path}"
-echo "model_path: ${model_path}"
-echo "-------------------------------------------------------------------------------"
-echo "Model Inference Data Type: ${DTYPE}"
-echo "Sensitive Layer Data Type: ${SENSITIVE_LAYER_DTYPE}"
-echo "Performance Profiling Debug Mode: ${ENABLE_PROFILING_DEBUG}"
-echo "Graph Mode Optimization: ${ENABLE_GRAPH_MODE}"
-echo "==============================================================================="
-python -m lightx2v.infer \
--model_cls wan2.1_audio \
--task i2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/offload/disk/wan_i2v_audio_phase_lazy_load_720p.json \
--prompt  "The video features a old lady is saying something and knitting a sweater." \
--negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
--image_path ${lightx2v_path}/assets/inputs/audio/15.png \
--audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_audio.mp4
--- a/scripts/wan22/run_wan22_ti2v_i2v_audio.sh
+++ b/scripts/wan22/run_wan22_ti2v_i2v_audio.sh
-#!/bin/bash
-# set path and first
-lightx2v_path=
-model_path=
-export CUDA_VISIBLE_DEVICES=0
-# set environment variables
-source ${lightx2v_path}/scripts/base/base.sh
-export ENABLE_GRAPH_MODE=false
-export SENSITIVE_LAYER_DTYPE=None
-python -m lightx2v.infer \
--model_cls wan2.2_audio \
--task i2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/audio_driven/wan22_ti2v_i2v_audio.json \
--prompt  "The video features a old lady is saying something and knitting a sweater." \
--negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
--image_path ${lightx2v_path}/assets/inputs/audio/15.png \
--audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_audio.mp4
--- a/scripts/seko_audio_driven/run_wan_i2v_audio.sh
+++ b/scripts/seko_audio_driven/run_wan_i2v_audio.sh
@@ -14,12 +14,12 @@ export ENABLE_GRAPH_MODE=false
 export SENSITIVE_LAYER_DTYPE=None
 python -m lightx2v.infer \
--model_cls wan2.1_audio \
+--model_cls seko_talk \
 --task i2v \
 --model_path $model_path \
--config_json ${lightx2v_path}/configs/audio_driven/wan_i2v_audio.json \
+--config_json ${lightx2v_path}/configs/seko_talk/wan_i2v_audio.json \
 --prompt  "The video features a old lady is saying something and knitting a sweater." \
 --negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
 --image_path ${lightx2v_path}/assets/inputs/audio/15.png \
 --audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_audio.mp4
+--save_video_path ${lightx2v_path}/save_results/output_lightx2v_seko_talk.mp4
--- a/test_cases/run_wan_i2v_audio.sh
+++ b/test_cases/run_wan_i2v_audio.sh
-#!/bin/bash
-# set path and first
-lightx2v_path=
-model_path=
-export CUDA_VISIBLE_DEVICES=0
-# set environment variables
-source ${lightx2v_path}/scripts/base/base.sh
-export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
-export ENABLE_GRAPH_MODE=false
-export SENSITIVE_LAYER_DTYPE=None
-python -m lightx2v.infer \
--model_cls wan2.1_audio \
--task i2v \
--model_path $model_path \
--config_json ${lightx2v_path}/configs/audio_driven/wan_i2v_audio.json \
--prompt  "The video features a old lady is saying something and knitting a sweater." \
--negative_prompt 色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走 \
--image_path ${lightx2v_path}/assets/inputs/audio/15.png \
--audio_path ${lightx2v_path}/assets/inputs/audio/15.wav \
--save_video_path ${lightx2v_path}/save_results/output_lightx2v_wan_i2v_audio.mp4