init

57463d8d · suily · 57463d8d · 57463d8d · 57463d8d · 57463d8d
Commit 57463d8d authored Nov 13, 2024 by suily
20 changed files
--- a/.gitignore
+++ b/.gitignore
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+.pybuilder/
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+#   For a library or package, you might want to ignore these files since the code is
+#   intended to run in multiple environments; otherwise, check them in:
+# .python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# poetry
+#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
+#   This is especially recommended for binary packages to ensure reproducibility, and is more
+#   commonly ignored for libraries.
+#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
+#poetry.lock
+
+# pdm
+#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
+#pdm.lock
+#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
+#   in version control.
+#   https://pdm.fming.dev/#use-with-ide
+.pdm.toml
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# PyCharm
+#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
+#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
+#  and can be added to the global gitignore or merged into this file.  For a more nuclear
+#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
+.idea/
+
+examples/results/*
+gfpgan/*
+checkpoints/*
+assets/*
+results/*
+Dockerfile
+start_docker.sh
+start.sh
+
+checkpoints
+
+# Mac
+.DS_Store
+
+# modelzoo
+checkpoints/
+gfpgan/
+result/*
\ No newline at end of file
--- a/LICENSE
+++ b/LICENSE
+Tencent is pleased to support the open source community by making SadTalker available.
+
+Copyright (C), a Tencent company. All rights reserved.
+
+SadTalker is licensed under the Apache 2.0 License, except for the third-party components listed below.
+
+Terms of the Apache License Version 2.0:
+---------------------------------------------
+                                Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/README.md
+++ b/README.md
+# SadTalker
+## 论文
+`SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation`
+- https://arxiv.org/abs/2211.12194
+## 模型结构
+SadTalker利用 3DMM 系数作为中间表示。首先从原始图像中提取出系数，然后使用 ExpNet 和 PoseVAE 从音频中提取出真实的 3DMM 的系数（面部表情系数 β , 头部姿势 ρ），最后通过 3D-aware face render 生成得到最后的视频。
+<div align=center>
+    <img src="./doc/SadTalker.PNG"/>
+</div>
+
+## 算法原理
+将 3DMM（3D Morphable Models）的运动系数看做中间表达，将整个任务划分成两部分。训练的时候会分模型训练，在推理的时候是 end-to-end 的模式:
+
+1、从语音中生成更加真实的运动系数（如 head pose、lip motion、eye blink），并且每个系数是单独学习的，这样会解耦来降低不确定性：
+
+ExpNet：通过第一帧的 expression coefficient β 来将 expression motion 和 specific person 进行关联，为了降低在说话过程中其他面部部位的影响只使用 lip motion coefficient 作为 target coefficient 。其他不是很重要的面部动作（如眨眼）会使用额外的 landmark loss 来训练。
+
+PoseVAE：训练时，pose VAE 在固定 n 帧上使用 encoder-decoder 的结构进行训练，encoder 和 decoder 输入包含连续的 t 帧 head pose ，且假设其服从高斯分布。decoder 中，网络学习的目标是从分布中通过采样来生成 t 帧 pose ，但不是直接生成 pose，而是学习和第一帧 pose ρ0 的残差，这样能保证生成的 pose 更连续、稳定、一致，所以也叫 conditional VAE，这里的 conditional 就是第一帧的 head pose。此外，还将每个声音的特征和 style identity 作为条件来作为 identity style。KL 散度用于衡量生成的 motion。MSE 和 GAN loss 用于保证生成的质量。
+
+2、生成了 3DMM 系数后，会做从原本的图片建立 3D 人脸，然后再生成最后的视频：
+
+3D-aware Face Render：类似于 face-vid2vid 的结构能够实现从单张图中学习隐含的 3D 信息，但 face-vid2vid 需要真实视频作为驱动信号，而 3D-aware Face Render 利用 mappingNet 学习 3DMM 运动系数与无监督 3D 关键点之间关系。
+
+<div align=center>
+    <img src="./doc/ExpNet.PNG"/>
+</div>
+<div align=center>
+    <img src="./doc/PoseVAE.PNG"/>
+</div>
+<div align=center>
+    <img src="./doc/FaceRender.PNG"/>
+</div>
+
+## 环境配置
+### Docker（方法一）
+```
+docker pull image.sourcefind.cn:5000/dcu/admin/base/jupyterlab-pytorch:2.1.0-ubuntu20.04-dtk24.04.2-py3.8
+docker run -it --name=SadTalker --network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/SadTalker -v /opt/hyhal/:/opt/hyhal/:ro <imageID> bash  # <imageID>为以上拉取的docker的镜像ID替换
+
+cd SadTalker
+# 安装ffmpeg：格式转换相关
+apt update
+apt install ffmpeg
+# 安装依赖
+pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
+pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
+pip install -r requirements.txt
+```
+### Dockerfile（方法二）
+```
+docker build --no-cache -t sadtalker:latest .
+docker run -it --name=SadTalker --network=host --privileged=true --device=/dev/kfd --device=/dev/dri --shm-size=16G --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /path/your_code_data:/path/SadTalker -v /opt/hyhal/:/opt/hyhal/:ro sadtalker /bin/bash
+
+cd SadTalker
+# 安装ffmpeg：格式转换相关
+apt update
+apt install ffmpeg
+# 安装依赖
+pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
+pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
+pip install -r requirements.txt
+```
+### Anaconda（方法三）
+1、关于本项目DCU显卡所需的特殊深度学习库可从光合开发者社区下载安装: https://developer.hpccube.com/tool/
+```
+DTK软件栈：dtk24.04.2
+python：python3.8
+pytorch：2.1.0
+torchvision：
+torchaudio：
+```
+`Tips：以上dtk软件栈、python、pytorch等DCU相关工具版本需要严格一一对应`
+
+2、其他非特殊库直接按照下面步骤进行安装
+```
+cd SadTalker
+# 安装ffmpeg：格式转换相关
+apt update
+apt install ffmpeg
+# 安装依赖
+pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
+pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
+pip install -r requirements.txt
+```
+## 数据集
+推理测试所用数据已保存在SadTalker/dataset/下，目录结构如下：
+```
+ ── dataset
+    │   ├── bus_chinese.wav
+    │   └── image.png
+```
+## 训练
+官方暂未开放
+## 推理
+模型可通过[scnet](http://113.200.138.88:18080/aimodels/findsource-dependency/sadtalker)或以下方式进行下载：
+
+1-1、Pre-Trained Models
+* [Google Drive](https://drive.google.com/file/d/1gwWh45pF7aelNP_P78uDJL8Sycep-K7j/view?usp=sharing)
+* [GitHub Releases](https://github.com/OpenTalker/SadTalker/releases)
+* [Baidu (百度云盘)](https://pan.baidu.com/s/1kb1BCPaLOWX1JJb9Czbn6w?pwd=sadt) (Password: `sadt`)
+
+1-2、GFPGAN Offline Patch
+* [Google Drive](https://drive.google.com/file/d/19AIBsmfcHW6BRJmeqSFlG5fL445Xmsyi?usp=sharing)
+* [GitHub Releases](https://github.com/OpenTalker/SadTalker/releases)
+* [Baidu (百度云盘)](https://pan.baidu.com/s/1P4fRgk9gaSutZnn8YW034Q?pwd=sadt) (Password: `sadt`)
+
+2、运行自动下载（GitHub Releases）：
+```
+cd SadTalker
+sh scripts/download_models.sh
+```
+模型目录结构如下，checkpoints是预训练模型，gfpgan是人脸检测和增强模型：
+```
+ ── checkpoints
+    │   └── ...
+ ── gfpgan
+    │   └── weights
+    │          └── ...
+```
+推理运行代码：
+```
+HIP_VISIBLE_DEVICES=0 python inference.py \
+	--driven_audio dataset/bus_chinese.wav \
+	--source_image dataset/image.png \
+	--still \
+	--preprocess full \
+	--enhancer gfpgan \
+	--result_dir result/
+
+# --driven_audio 音频数据的路径
+# --source_image 图片数据的路径
+# --still 使用与原始图像相同的姿势参数，头部运动较少
+# --preprocess full 对图像进行['crop', 'extcrop', 'resize', 'full', 'extfull']预处理
+# --enhancer 使用或通过人脸修复网络[gfpgan, RestoreFormer]增强生成的人脸
+# --result_dir 输出路径
+# 更多参数设置可参考inference.py的parser注释和docs/best_practice.md
+```
+## result
+推理运行的默认推理结果为：
+<div align=center>
+    <video src="./doc/inference_result.mp4"/>
+</div>
+
+### 精度
+无
+## 应用场景
+### 算法类别
+`视频生成`
+### 热点应用行业
+`家具,电商,医疗,广媒,教育`
+## 预训练权重
+- http://113.200.138.88:18080/aimodels/findsource-dependency/sadtalker
+## 源码仓库及问题反馈
+- 
+## 参考资料
+- https://github.com/OpenTalker/SadTalker
--- a/README_origin.md
+++ b/README_origin.md
+<div align="center">
+
+<img src='https://user-images.githubusercontent.com/4397546/229094115-862c747e-7397-4b54-ba4a-bd368bfe2e0f.png' width='500px'/>
+
+
+<!--<h2> 😭 SadTalker： <span style="font-size:12px">Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation </span> </h2> -->
+
+  <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> &nbsp; <a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) &nbsp; [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker) &nbsp; [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) &nbsp; <br> [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker) [![Discord](https://dcbadge.vercel.app/api/server/rrayYqZ4tf?style=flat)](https://discord.gg/rrayYqZ4tf)
+
+<div>
+    <a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp;
+    <a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp;
+    <a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>3</sup></a>&emsp;
+    <a href='https://yzhang2016.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp;
+    <a href='https://xishen0220.github.io/' target='_blank'>Xi Shen <sup>2</sup></a>&emsp; </br>
+    <a href='https://yuguo-xjtu.github.io/' target='_blank'>Yu Guo<sup>1</sup> </a>&emsp;
+    <a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a>&emsp;
+    <a target='_blank'>Fei Wang <sup>1</sup> </a>&emsp;
+</div>
+<br>
+<div>
+    <sup>1</sup> Xi'an Jiaotong University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Ant Group &emsp; 
+</div>
+<br>
+<i><strong><a href='https://arxiv.org/abs/2211.12194' target='_blank'>CVPR 2023</a></strong></i>
+<br>
+<br>
+
+
+![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif)
+
+<b>TL;DR: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; single portrait image 🙎‍♂️  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; audio 🎤  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; talking head video 🎞.</b>
+
+<br>
+
+</div>
+
+
+
+## Highlights
+
+- The license has been updated to Apache 2.0, and we've removed the non-commercial restriction
+- **SadTalker has now officially been integrated into Discord, where you can use it for free by sending files. You can also generate high-quailty videos from text prompts. Join: [![Discord](https://dcbadge.vercel.app/api/server/rrayYqZ4tf?style=flat)](https://discord.gg/rrayYqZ4tf)**
+
+- We've published a [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) extension. Check out more details [here](docs/webui_extension.md). [Demo Video](https://user-images.githubusercontent.com/4397546/231495639-5d4bb925-ea64-4a36-a519-6389917dac29.mp4)
+
+- Full image mode is now available! [More details...](https://github.com/OpenTalker/SadTalker#full-bodyimage-generation)
+
+| still+enhancer in v0.0.1                 | still + enhancer   in v0.0.2       |   [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) |
+|:--------------------: |:--------------------: | :----: |
+| <video  src="https://user-images.githubusercontent.com/48216707/229484996-5d7be64f-2553-4c9e-a452-c5cf0b8ebafe.mp4" type="video/mp4"> </video> | <video  src="https://user-images.githubusercontent.com/4397546/230717873-355b7bf3-d3de-49f9-a439-9220e623fce7.mp4" type="video/mp4"> </video>  | <img src='./examples/source_image/full_body_2.png' width='380'> 
+
+- Several new modes (Still, reference, and resize modes) are now available!
+
+- We're happy to see more community demos on [bilibili](https://search.bilibili.com/all?keyword=sadtalker), [YouTube](https://www.youtube.com/results?search_query=sadtalker) and [X (#sadtalker)](https://twitter.com/search?q=%23sadtalker&src).
+
+## Changelog 
+
+The previous changelog can be found [here](docs/changlelog.md).
+
+- __[2023.06.12]__: Added more new features in WebUI extension, see the discussion [here](https://github.com/OpenTalker/SadTalker/discussions/386).
+
+- __[2023.06.05]__: Released a new 512x512px (beta) face model. Fixed some bugs and improve the performance.
+
+- __[2023.04.15]__: Added a WebUI Colab notebook by [@camenduru](https://github.com/camenduru/): [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb)
+
+- __[2023.04.12]__: Added a more detailed WebUI installation document and fixed a problem when reinstalling.
+
+- __[2023.04.12]__: Fixed the WebUI safe issues becasue of 3rd-party packages, and optimized the output path in `sd-webui-extension`.
+
+- __[2023.04.08]__: In v0.0.2, we added a logo watermark to the generated video to prevent abuse. _This watermark has since been removed in a later release._
+
+- __[2023.04.08]__: In v0.0.2, we added features for full image animation and a link to download checkpoints from Baidu. We also optimized the enhancer logic.
+
+## To-Do
+
+We're tracking new updates in [issue #280](https://github.com/OpenTalker/SadTalker/issues/280).
+
+## Troubleshooting
+
+If you have any problems, please read our [FAQs](docs/FAQ.md) before opening an issue.
+
+
+
+## 1. Installation.
+
+Community tutorials: [中文Windows教程 (Chinese Windows tutorial)](https://www.bilibili.com/video/BV1Dc411W7V6/) | [日本語コース (Japanese tutorial)](https://br-d.fanbox.cc/posts/5685086).
+
+### Linux/Unix
+
+1. Install [Anaconda](https://www.anaconda.com/), Python and `git`.
+
+2. Creating the env and install the requirements.
+  ```bash
+  git clone https://github.com/OpenTalker/SadTalker.git
+
+  cd SadTalker 
+
+  conda create -n sadtalker python=3.8
+
+  conda activate sadtalker
+
+  pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
+
+  conda install ffmpeg
+
+  pip install -r requirements.txt
+
+  ### Coqui TTS is optional for gradio demo. 
+  ### pip install TTS
+
+  ```  
+### Windows
+
+A video tutorial in chinese is available [here](https://www.bilibili.com/video/BV1Dc411W7V6/). You can also follow the following instructions:
+
+1. Install [Python 3.8](https://www.python.org/downloads/windows/) and check "Add Python to PATH".
+2. Install [git](https://git-scm.com/download/win) manually or using [Scoop](https://scoop.sh/): `scoop install git`.
+3. Install `ffmpeg`, following [this tutorial](https://www.wikihow.com/Install-FFmpeg-on-Windows) or using [scoop](https://scoop.sh/): `scoop install ffmpeg`.
+4. Download the SadTalker repository by running `git clone https://github.com/Winfredy/SadTalker.git`.
+5. Download the checkpoints and gfpgan models in the [downloads section](#2-download-models).
+6. Run `start.bat` from Windows Explorer as normal, non-administrator, user, and a Gradio-powered WebUI demo will be started.
+
+### macOS
+
+A tutorial on installing SadTalker on macOS can be found [here](docs/install.md).
+
+### Docker, WSL, etc
+
+Please check out additional tutorials [here](docs/install.md).
+
+## 2. Download Models
+
+You can run the following script on Linux/macOS to automatically download all the models:
+
+```bash
+bash scripts/download_models.sh
+```
+
+We also provide an offline patch (`gfpgan/`), so no model will be downloaded when generating.
+
+### Pre-Trained Models
+
+* [Google Drive](https://drive.google.com/file/d/1gwWh45pF7aelNP_P78uDJL8Sycep-K7j/view?usp=sharing)
+* [GitHub Releases](https://github.com/OpenTalker/SadTalker/releases)
+* [Baidu (百度云盘)](https://pan.baidu.com/s/1kb1BCPaLOWX1JJb9Czbn6w?pwd=sadt) (Password: `sadt`)
+
+<!-- TODO add Hugging Face links -->
+
+### GFPGAN Offline Patch
+
+* [Google Drive](https://drive.google.com/file/d/19AIBsmfcHW6BRJmeqSFlG5fL445Xmsyi?usp=sharing)
+* [GitHub Releases](https://github.com/OpenTalker/SadTalker/releases)
+* [Baidu (百度云盘)](https://pan.baidu.com/s/1P4fRgk9gaSutZnn8YW034Q?pwd=sadt) (Password: `sadt`)
+
+<!-- TODO add Hugging Face links -->
+
+
+<details><summary>Model Details</summary>
+
+
+Model explains:
+
+##### New version 
+| Model | Description
+| :--- | :----------
+|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render).
+|checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render).
+|gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
+  
+  
+##### Old version
+| Model | Description
+| :--- | :----------
+|checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker.
+|checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker.
+|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
+|checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis).
+|checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction).
+|checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip).
+|checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/). 
+|checkpoints/BFM | 3DMM library file.  
+|checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).
+|gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
+
+The final folder will be shown as:
+
+<img width="331" alt="image" src="https://user-images.githubusercontent.com/4397546/232511411-4ca75cbf-a434-48c5-9ae0-9009e8316484.png">
+
+
+</details>
+
+## 3. Quick Start
+
+Please read our document on [best practices and configuration tips](docs/best_practice.md)
+
+### WebUI Demos
+
+**Online Demo**: [HuggingFace](https://huggingface.co/spaces/vinthony/SadTalker) | [SDWebUI-Colab](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) | [Colab](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)
+
+**Local WebUI extension**: Please refer to [WebUI docs](docs/webui_extension.md).
+
+**Local gradio demo (recommanded)**: A Gradio instance similar to our [Hugging Face demo](https://huggingface.co/spaces/vinthony/SadTalker) can be run locally:
+
+```bash
+## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
+python app_sadtalker.py
+```
+
+You can also start it more easily:
+
+- windows: just double click `webui.bat`, the requirements will be installed automatically.
+- Linux/Mac OS: run `bash webui.sh` to start the webui.
+
+
+### CLI usage
+
+##### Animating a portrait image from default config:
+```bash
+python inference.py --driven_audio <audio.wav> \
+                    --source_image <video.mp4 or picture.png> \
+                    --enhancer gfpgan 
+```
+The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`.
+
+##### Full body/image Generation:
+
+Using `--still` to generate a natural full body video. You can add `enhancer` to improve the quality of the generated video. 
+
+```bash
+python inference.py --driven_audio <audio.wav> \
+                    --source_image <video.mp4 or picture.png> \
+                    --result_dir <a file to store results> \
+                    --still \
+                    --preprocess full \
+                    --enhancer gfpgan 
+```
+
+More examples and configuration and tips can be founded in the [ >>> best practice documents <<<](docs/best_practice.md).
+
+## Citation
+
+If you find our work useful in your research, please consider citing:
+
+```bibtex
+@article{zhang2022sadtalker,
+  title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
+  author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
+  journal={arXiv preprint arXiv:2211.12194},
+  year={2022}
+}
+```
+
+## Acknowledgements
+
+Facerender code borrows heavily from [zhanglonghao's reproduction of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis) and [PIRender](https://github.com/RenYurui/PIRender). We thank the authors for sharing their wonderful code. In training process, we also used the model from [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction) and [Wav2lip](https://github.com/Rudrabha/Wav2Lip). We thank for their wonderful work.
+
+We also use the following 3rd-party libraries:
+
+- **Face Utils**: https://github.com/xinntao/facexlib
+- **Face Enhancement**: https://github.com/TencentARC/GFPGAN
+- **Image/Video Enhancement**:https://github.com/xinntao/Real-ESRGAN
+
+## Extensions:
+
+- [SadTalker-Video-Lip-Sync](https://github.com/Zz-ww/SadTalker-Video-Lip-Sync) from [@Zz-ww](https://github.com/Zz-ww): SadTalker for Video Lip Editing
+
+## Related Works
+- [StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)](https://github.com/FeiiYin/StyleHEAT)
+- [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)](https://github.com/Doubiiu/CodeTalker)
+- [VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)](https://github.com/vinthony/video-retalking)
+- [DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)](https://github.com/Carlyx/DPE)
+- [3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)](https://github.com/FeiiYin/SPI/)
+- [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)](https://github.com/Mael-zys/T2M-GPT)
+
+## Disclaimer
+
+This is not an official product of Tencent. 
+
+```
+1. Please carefully read and comply with the open-source license applicable to this code before using it. 
+2. Please carefully read and comply with the intellectual property declaration applicable to this code before using it.
+3. This open-source code runs completely offline and does not collect any personal information or other data. If you use this code to provide services to end-users and collect related data, please take necessary compliance measures according to applicable laws and regulations (such as publishing privacy policies, adopting necessary data security strategies, etc.). If the collected data involves personal information, user consent must be obtained (if applicable). Any legal liabilities arising from this are unrelated to Tencent.
+4. Without Tencent's written permission, you are not authorized to use the names or logos legally owned by Tencent, such as "Tencent." Otherwise, you may be liable for legal responsibilities.
+5. This open-source code does not have the ability to directly provide services to end-users. If you need to use this code for further model training or demos, as part of your product to provide services to end-users, or for similar use, please comply with applicable laws and regulations for your product or service. Any legal liabilities arising from this are unrelated to Tencent.
+6. It is prohibited to use this open-source code for activities that harm the legitimate rights and interests of others (including but not limited to fraud, deception, infringement of others' portrait rights, reputation rights, etc.), or other behaviors that violate applicable laws and regulations or go against social ethics and good customs (including providing incorrect or false information, spreading pornographic, terrorist, and violent information, etc.). Otherwise, you may be liable for legal responsibilities.
+```
+
+LOGO: color and font suggestion: [ChatGPT](https://chat.openai.com), logo font: [Montserrat Alternates
+](https://fonts.google.com/specimen/Montserrat+Alternates?preview.text=SadTalker&preview.text_type=custom&query=mont).
+
+All the copyrights of the demo images and audio are from community users or the generation from stable diffusion. Feel free to contact us if you would like use to remove them.
+
+
+<!-- Spelling fixed on Tuesday, September 12, 2023 by @fakerybakery (https://github.com/fakerybakery). These changes are licensed under the Apache 2.0 license. -->
--- a/app_sadtalker.py
+++ b/app_sadtalker.py
+import os, sys
+import gradio as gr
+from src.gradio_demo import SadTalker  
+
+
+try:
+    import webui  # in webui
+    in_webui = True
+except:
+    in_webui = False
+
+
+def toggle_audio_file(choice):
+    if choice == False:
+        return gr.update(visible=True), gr.update(visible=False)
+    else:
+        return gr.update(visible=False), gr.update(visible=True)
+    
+def ref_video_fn(path_of_ref_video):
+    if path_of_ref_video is not None:
+        return gr.update(value=True)
+    else:
+        return gr.update(value=False)
+
+def sadtalker_demo(checkpoint_path='checkpoints', config_path='src/config', warpfn=None):
+
+    sad_talker = SadTalker(checkpoint_path, config_path, lazy_load=True)
+
+    with gr.Blocks(analytics_enabled=False) as sadtalker_interface:
+        gr.Markdown("<div align='center'> <h2> 😭 SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation (CVPR 2023) </span> </h2> \
+                    <a style='font-size:18px;color: #efefef' href='https://arxiv.org/abs/2211.12194'>Arxiv</a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
+                    <a style='font-size:18px;color: #efefef' href='https://sadtalker.github.io'>Homepage</a>  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; \
+                     <a style='font-size:18px;color: #efefef' href='https://github.com/Winfredy/SadTalker'> Github </div>")
+        
+        with gr.Row().style(equal_height=False):
+            with gr.Column(variant='panel'):
+                with gr.Tabs(elem_id="sadtalker_source_image"):
+                    with gr.TabItem('Upload image'):
+                        with gr.Row():
+                            source_image = gr.Image(label="Source image", source="upload", type="filepath", elem_id="img2img_image").style(width=512)
+
+                with gr.Tabs(elem_id="sadtalker_driven_audio"):
+                    with gr.TabItem('Upload OR TTS'):
+                        with gr.Column(variant='panel'):
+                            driven_audio = gr.Audio(label="Input audio", source="upload", type="filepath")
+
+                        if sys.platform != 'win32' and not in_webui: 
+                            from src.utils.text2speech import TTSTalker
+                            tts_talker = TTSTalker()
+                            with gr.Column(variant='panel'):
+                                input_text = gr.Textbox(label="Generating audio from text", lines=5, placeholder="please enter some text here, we genreate the audio from text using @Coqui.ai TTS.")
+                                tts = gr.Button('Generate audio',elem_id="sadtalker_audio_generate", variant='primary')
+                                tts.click(fn=tts_talker.test, inputs=[input_text], outputs=[driven_audio])
+                            
+            with gr.Column(variant='panel'): 
+                with gr.Tabs(elem_id="sadtalker_checkbox"):
+                    with gr.TabItem('Settings'):
+                        gr.Markdown("need help? please visit our [best practice page](https://github.com/OpenTalker/SadTalker/blob/main/docs/best_practice.md) for more detials")
+                        with gr.Column(variant='panel'):
+                            # width = gr.Slider(minimum=64, elem_id="img2img_width", maximum=2048, step=8, label="Manually Crop Width", value=512) # img2img_width
+                            # height = gr.Slider(minimum=64, elem_id="img2img_height", maximum=2048, step=8, label="Manually Crop Height", value=512) # img2img_width
+                            pose_style = gr.Slider(minimum=0, maximum=46, step=1, label="Pose style", value=0) # 
+                            size_of_image = gr.Radio([256, 512], value=256, label='face model resolution', info="use 256/512 model?") # 
+                            preprocess_type = gr.Radio(['crop', 'resize','full', 'extcrop', 'extfull'], value='crop', label='preprocess', info="How to handle input image?")
+                            is_still_mode = gr.Checkbox(label="Still Mode (fewer head motion, works with preprocess `full`)")
+                            batch_size = gr.Slider(label="batch size in generation", step=1, maximum=10, value=2)
+                            enhancer = gr.Checkbox(label="GFPGAN as Face enhancer")
+                            submit = gr.Button('Generate', elem_id="sadtalker_generate", variant='primary')
+                            
+                with gr.Tabs(elem_id="sadtalker_genearted"):
+                        gen_video = gr.Video(label="Generated video", format="mp4").style(width=256)
+
+        if warpfn:
+            submit.click(
+                        fn=warpfn(sad_talker.test), 
+                        inputs=[source_image,
+                                driven_audio,
+                                preprocess_type,
+                                is_still_mode,
+                                enhancer,
+                                batch_size,                            
+                                size_of_image,
+                                pose_style
+                                ], 
+                        outputs=[gen_video]
+                        )
+        else:
+            submit.click(
+                        fn=sad_talker.test, 
+                        inputs=[source_image,
+                                driven_audio,
+                                preprocess_type,
+                                is_still_mode,
+                                enhancer,
+                                batch_size,                            
+                                size_of_image,
+                                pose_style
+                                ], 
+                        outputs=[gen_video]
+                        )
+
+    return sadtalker_interface
+ 
+
+if __name__ == "__main__":
+
+    demo = sadtalker_demo()
+    demo.queue()
+    demo.launch()
+
+
--- a/cog.yaml
+++ b/cog.yaml
+build:
+  gpu: true
+  cuda: "11.3"
+  python_version: "3.8"
+  system_packages:
+    - "ffmpeg"
+    - "libgl1-mesa-glx"
+    - "libglib2.0-0"
+  python_packages:
+    - "torch==1.12.1"
+    - "torchvision==0.13.1"
+    - "torchaudio==0.12.1"
+    - "joblib==1.1.0"
+    - "scikit-image==0.19.3"
+    - "basicsr==1.4.2"
+    - "facexlib==0.3.0"
+    - "resampy==0.3.1"
+    - "pydub==0.25.1"
+    - "scipy==1.10.1"
+    - "kornia==0.6.8"
+    - "face_alignment==1.3.5"
+    - "imageio==2.19.3"
+    - "imageio-ffmpeg==0.4.7"
+    - "librosa==0.9.2" #
+    - "tqdm==4.65.0"
+    - "yacs==0.1.8"
+    - "gfpgan==1.3.8"
+    - "dlib-bin==19.24.1"
+    - "av==10.0.0"
+    - "trimesh==3.9.20"
+  run:
+    - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/s3fd-619a316812.pth" "https://www.adrianbulat.com/downloads/python-fan/s3fd-619a316812.pth"
+    - mkdir -p /root/.cache/torch/hub/checkpoints/ && wget --output-document "/root/.cache/torch/hub/checkpoints/2DFAN4-cd938726ad.zip" "https://www.adrianbulat.com/downloads/python-fan/2DFAN4-cd938726ad.zip"
+
+predict: "predict.py:Predictor"
--- a/dataset/bus_chinese.wav
+++ b/dataset/bus_chinese.wav
--- a/dataset/image.png
+++ b/dataset/image.png
--- a/doc/ExpNet.PNG
+++ b/doc/ExpNet.PNG
--- a/doc/FaceRender.PNG
+++ b/doc/FaceRender.PNG
--- a/doc/PoseVAE.PNG
+++ b/doc/PoseVAE.PNG
--- a/doc/SadTalker.PNG
+++ b/doc/SadTalker.PNG
--- a/doc/inference_result.mp4
+++ b/doc/inference_result.mp4
--- a/docs/FAQ.md
+++ b/docs/FAQ.md
+
+## Frequency Asked Question
+
+**Q: `ffmpeg` is not recognized as an internal or external command**
+
+In Linux, you can install the ffmpeg via `conda install ffmpeg`. Or on Mac OS X, try to install ffmpeg via `brew install ffmpeg`. On windows, make sure you have `ffmpeg` in the `%PATH%` as suggested in [#54](https://github.com/Winfredy/SadTalker/issues/54), then, following [this](https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/) installation to install `ffmpeg`.
+
+**Q: Running Requirments.**
+
+Please refer to the discussion here: https://github.com/Winfredy/SadTalker/issues/124#issuecomment-1508113989
+
+
+**Q: ModuleNotFoundError: No module named 'ai'**
+
+please check the checkpoint's size of the `epoch_20.pth`. (https://github.com/Winfredy/SadTalker/issues/167, https://github.com/Winfredy/SadTalker/issues/113)
+
+**Q: Illegal Hardware Error: Mac M1**
+
+please reinstall the `dlib` by `pip install dlib` individually. (https://github.com/Winfredy/SadTalker/issues/129, https://github.com/Winfredy/SadTalker/issues/109)
+
+
+**Q: FileNotFoundError: [Errno 2] No such file or directory: checkpoints\BFM_Fitting\similarity_Lm3D_all.mat**
+
+Make sure you have downloaded the checkpoints and gfpgan as [here](https://github.com/Winfredy/SadTalker#-2-download-trained-models) and placed them in the right place. 
+
+**Q: RuntimeError: unexpected EOF, expected 237192 more bytes. The file might be corrupted.**
+
+The files are not automatically downloaded. Please update the code and download the gfpgan folders as [here](https://github.com/Winfredy/SadTalker#-2-download-trained-models).
+
+**Q: CUDA out of memory error**
+
+please refer to https://stackoverflow.com/questions/73747731/runtimeerror-cuda-out-of-memory-how-setting-max-split-size-mb
+
+``` 
+# windows
+set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 
+python inference.py ...
+
+# linux
+export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128 
+python inference.py ...
+```
+
+**Q: Error while decoding stream #0:0: Invalid data found when processing input [mp3float @ 0000015037628c00] Header missing**
+
+Our method only support wav or mp3 files as input, please make sure the feeded audios are in these formats.
--- a/docs/best_practice.md
+++ b/docs/best_practice.md
+# Best Practices and Tips for configuration
+
+> Our model only works on REAL people or the portrait image similar to REAL person. The anime talking head genreation method will be released in future.
+
+Advanced confiuration options for `inference.py`:
+
+| Name        | Configuration | default |   Explaination  | 
+|:------------- |:------------- |:----- | :------------- |
+| Enhance Mode | `--enhancer` | None | Using `gfpgan` or `RestoreFormer` to enhance the generated face via face restoration network 
+| Background Enhancer | `--background_enhancer` | None | Using `realesrgan` to enhance the full video. 
+| Still Mode   | ` --still` | False |  Using the same pose parameters as the original image, fewer head motion.
+| Expressive Mode | `--expression_scale` | 1.0 | a larger value will make the expression motion stronger.
+| save path | `--result_dir` |`./results` | The file will be save in the newer location.
+| preprocess | `--preprocess` | `crop` | Run and produce the results in the croped input image. Other choices: `resize`, where the images will be resized to the specific resolution. `full` Run the full image animation, use with `--still` to get better results.
+| ref Mode (eye) | `--ref_eyeblink` | None | A video path, where we borrow the eyeblink from this reference video to provide more natural eyebrow movement.
+| ref Mode (pose) | `--ref_pose` | None | A video path, where we borrow the pose from the head reference video. 
+| 3D Mode | `--face3dvis` | False | Need additional installation. More details to generate the 3d face can be founded [here](docs/face3d.md). 
+| free-view Mode | `--input_yaw`,<br> `--input_pitch`,<br> `--input_roll` | None | Genearting novel view or free-view 4D talking head from a single image. More details can be founded [here](https://github.com/Winfredy/SadTalker#generating-4d-free-view-talking-examples-from-audio-and-a-single-image).
+
+
+### About `--preprocess`
+
+Our system automatically handles the input images via `crop`, `resize` and `full`.
+
+In `crop` mode, we only generate the croped image via the facial keypoints and generated the facial anime avator. The animation of both expression and head pose are realistic.
+
+> Still mode will stop the eyeblink and head pose movement.
+
+|  [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) | crop | crop w/still |
+|:--------------------: |:--------------------: | :----: |
+| <img src='../examples/source_image/full_body_2.png' width='380'> | ![full_body_2](example_crop.gif) | ![full_body_2](example_crop_still.gif) |
+
+
+In `resize` mode, we resize the whole images to generate the fully talking head video. Thus, an image similar to the ID photo can be produced. ⚠️ It will produce bad results for full person images.
+
+
+ 
+
+| <img src='../examples/source_image/full_body_2.png' width='380'> |  <img src='../examples/source_image/full4.jpeg' width='380'> |
+|:--------------------: |:--------------------: |
+| ❌ not suitable for resize mode | ✅ good for resize mode |
+| <img src='resize_no.gif'> |  <img src='resize_good.gif' width='380'> |
+
+In `full` mode, our model will automatically process the croped region and paste back to the original image. Remember to use `--still` to keep the original head pose.
+
+| input | `--still` | `--still` & `enhancer` |
+|:--------------------: |:--------------------: | :--:|
+| <img src='../examples/source_image/full_body_2.png' width='380'> |  <img src='./example_full.gif' width='380'> |  <img src='./example_full_enhanced.gif' width='380'> 
+
+
+### About `--enhancer`
+
+For higher resolution, we intergate [gfpgan](https://github.com/TencentARC/GFPGAN) and [real-esrgan](https://github.com/xinntao/Real-ESRGAN) for different purpose. Just adding `--enhancer <gfpgan or RestoreFormer>` or `--background_enhancer <realesrgan>` for the enhancement of the face and the full image.
+
+```bash
+# make sure above packages are available:
+pip install gfpgan
+pip install realesrgan
+```
+
+### About `--face3dvis`
+
+This flag indicate that we can generated the 3d-rendered face and it's 3d facial landmarks. More details can be founded [here](face3d.md).
+
+| Input        | Animated 3d face | 
+|:-------------: | :-------------: |
+|  <img src='../examples/source_image/art_0.png' width='200px'> | <video src="https://user-images.githubusercontent.com/4397546/226856847-5a6a0a4d-a5ec-49e2-9b05-3206db65e8e3.mp4"></video>  | 
+
+> Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.
+
+
+
+#### Reference eye-link mode.
+
+| Input, w/ reference video   ,  reference video    | 
+|:-------------: | 
+|  ![free_view](using_ref_video.gif)| 
+| If the reference video is shorter than the input audio, we will loop the reference video . 
+
+
+
+#### Generating 4D free-view talking examples from audio and a single image
+
+We use `input_yaw`, `input_pitch`, `input_roll` to control head pose. For example, `--input_yaw -20 30 10` means the input head yaw degree changes from -20 to 30 and then changes from 30 to 10.
+```bash
+python inference.py --driven_audio <audio.wav> \
+                    --source_image <video.mp4 or picture.png> \
+                    --result_dir <a file to store results> \
+                    --input_yaw -20 30 10
+```
+
+| Results, Free-view results,  Novel view results  | 
+|:-------------: | 
+|  ![free_view](free_view_result.gif)| 
--- a/docs/changlelog.md
+++ b/docs/changlelog.md
+## changelogs
+
+
+- __[2023.04.06]__: stable-diffiusion webui extension is release.
+
+- __[2023.04.03]__: Enable TTS in huggingface and gradio local demo.
+
+- __[2023.03.30]__: Launch beta version of the full body mode.
+
+- __[2023.03.30]__: Launch new feature: through using reference videos, our algorithm can generate videos with more natural eye blinking and some eyebrow movement.
+
+- __[2023.03.29]__: `resize mode` is online by `python infererence.py --preprocess resize`! Where we can produce a larger crop of the image as discussed in https://github.com/Winfredy/SadTalker/issues/35.
+
+- __[2023.03.29]__: local gradio demo is online! `python app.py` to start the demo. New `requirments.txt` is used to avoid the bugs in `librosa`.
+
+- __[2023.03.28]__: Online demo is launched in [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker), thanks AK!
+ 
+- __[2023.03.22]__: Launch new feature: generating the 3d face animation from a single image. New applications about it will be updated.
+
+- __[2023.03.22]__: Launch new feature: `still mode`, where only a small head pose will be produced via `python inference.py --still`. 
+
+- __[2023.03.18]__: Support `expression intensity`, now you can change the intensity of the generated motion: `python inference.py --expression_scale 1.3 (some value > 1)`.
+
+- __[2023.03.18]__: Reconfig the data folders, now you can download the checkpoint automatically using `bash scripts/download_models.sh`.
+- __[2023.03.18]__: We have offically integrate the [GFPGAN](https://github.com/TencentARC/GFPGAN) for face enhancement, using `python inference.py --enhancer gfpgan` for  better visualization performance.
+- __[2023.03.14]__: Specify the version of package `joblib` to remove the errors in using `librosa`, [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) is online!
+- __[2023.03.06]__: Solve some bugs in code and errors in installation 
+- __[2023.03.03]__: Release the test code for audio-driven single image animation!
+- __[2023.02.28]__: SadTalker has been accepted by CVPR 2023!
--- a/docs/example_crop.gif
+++ b/docs/example_crop.gif
--- a/docs/example_crop_still.gif
+++ b/docs/example_crop_still.gif
--- a/docs/example_full.gif
+++ b/docs/example_full.gif
--- a/docs/example_full_crop.gif
+++ b/docs/example_full_crop.gif