[docs] Migrate syntax (#12390)

* change syntax * make style

[docs] Migrate syntax (#12390)
* change syntax * make style
cc5b31ff · Steven Liu · GitHub · d7a1a036 · cc5b31ff · cc5b31ff
Unverified Commit cc5b31ff authored Sep 30, 2025 by Steven Liu Committed by GitHub Sep 30, 2025
20 changed files
--- a/docs/source/zh/training/text2image.md
+++ b/docs/source/zh/training/text2image.md
@@ -12,11 +12,8 @@ specific language governing permissions and limitations under the License.
 # 文生图
-<Tip warning={true}>
+> [!WARNING]
+> 文生图训练脚本目前处于实验阶段，容易出现过拟合和灾难性遗忘等问题。建议尝试不同超参数以获得最佳数据集适配效果。
-文生图训练脚本目前处于实验阶段，容易出现过拟合和灾难性遗忘等问题。建议尝试不同超参数以获得最佳数据集适配效果。
-</Tip>
 Stable Diffusion 等文生图模型能够根据文本提示生成对应图像。
@@ -49,11 +46,8 @@ pip install -r requirements_flax.txt
 </hfoption>
 </hfoptions>
-<Tip>
+> [!TIP]
+> 🤗 Accelerate 是支持多GPU/TPU训练和混合精度的工具库，能根据硬件环境自动配置训练参数。参阅 🤗 Accelerate [快速入门](https://huggingface.co/docs/accelerate/quicktour) 了解更多。
-🤗 Accelerate 是支持多GPU/TPU训练和混合精度的工具库，能根据硬件环境自动配置训练参数。参阅 🤗 Accelerate [快速入门](https://huggingface.co/docs/accelerate/quicktour) 了解更多。
-</Tip>
 初始化 🤗 Accelerate 环境：
@@ -79,11 +73,8 @@ write_basic_config()
 ## 脚本参数
-<Tip>
+> [!TIP]
+> 以下重点介绍脚本中影响训练效果的关键参数，如需完整参数说明可查阅 [脚本源码](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py)。如有疑问欢迎反馈。
-以下重点介绍脚本中影响训练效果的关键参数，如需完整参数说明可查阅 [脚本源码](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py)。如有疑问欢迎反馈。
-</Tip>
 训练脚本提供丰富参数供自定义训练流程，所有参数及说明详见 [`parse_args()`](https://github.com/huggingface/diffusers/blob/8959c5b9dec1c94d6ba482c94a58d2215c5fd026/examples/text_to_image/train_text_to_image.py#L193) 函数。该函数为每个参数提供默认值（如批次大小、学习率等），也可通过命令行参数覆盖。
@@ -160,11 +151,8 @@ def preprocess_train(examples):
 以 [火影忍者BLIP标注数据集](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions) 为例训练生成火影角色。设置环境变量 `MODEL_NAME` 和 `dataset_name` 指定模型和数据集（Hub或本地路径）。多GPU训练需在 `accelerate launch` 命令中添加 `--multi_gpu` 参数。
-<Tip>
+> [!TIP]
+> 使用本地数据集时，设置 `TRAIN_DIR` 和 `OUTPUT_DIR` 环境变量为数据集路径和模型保存路径。
-使用本地数据集时，设置 `TRAIN_DIR` 和 `OUTPUT_DIR` 环境变量为数据集路径和模型保存路径。
-</Tip>
 ```bash
 export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5"
@@ -194,11 +182,8 @@ Flax训练方案在TPU/GPU上效率更高（由 [@duongna211](https://github.com
 设置环境变量 `MODEL_NAME` 和 `dataset_name` 指定模型和数据集（Hub或本地路径）。
-<Tip>
+> [!TIP]
+> 使用本地数据集时，设置 `TRAIN_DIR` 和 `OUTPUT_DIR` 环境变量为数据集路径和模型保存路径。
-使用本地数据集时，设置 `TRAIN_DIR` 和 `OUTPUT_DIR` 环境变量为数据集路径和模型保存路径。
-</Tip>
 ```bash
 export MODEL_NAME="stable-diffusion-v1-5/stable-diffusion-v1-5"

--- a/docs/source/zh/training/text_inversion.md
+++ b/docs/source/zh/training/text_inversion.md
@@ -45,11 +45,8 @@ pip install -r requirements_flax.txt
 </hfoption>
 </hfoptions>
-<Tip>
+> [!TIP]
+> 🤗 Accelerate 是一个帮助您在多GPU/TPU或混合精度环境下训练的工具库。它会根据硬件和环境自动配置训练设置。查看🤗 Accelerate [快速入门](https://huggingface.co/docs/accelerate/quicktour)了解更多。
-🤗 Accelerate 是一个帮助您在多GPU/TPU或混合精度环境下训练的工具库。它会根据硬件和环境自动配置训练设置。查看🤗 Accelerate [快速入门](https://huggingface.co/docs/accelerate/quicktour)了解更多。
-</Tip>
 初始化🤗 Accelerate环境：
@@ -73,11 +70,8 @@ write_basic_config()
 最后，如果想在自定义数据集上训练模型，请参阅[创建训练数据集](create_dataset)指南，了解如何创建适用于训练脚本的数据集。
-<Tip>
+> [!TIP]
+> 以下部分重点介绍训练脚本中需要理解的关键修改点，但未涵盖脚本所有细节。如需深入了解，可随时查阅[脚本源码](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py)，如有疑问欢迎反馈。
-以下部分重点介绍训练脚本中需要理解的关键修改点，但未涵盖脚本所有细节。如需深入了解，可随时查阅[脚本源码](https://github.com/huggingface/diffusers/blob/main/examples/textual_inversion/textual_inversion.py)，如有疑问欢迎反馈。
-</Tip>
 ## 脚本参数
@@ -173,11 +167,8 @@ snapshot_download(
 - `token_identifier.txt`：特殊占位符词汇
 - `type_of_concept.txt`：训练概念类型（"object"或"style"）
-<Tip warning={true}>
+> [!WARNING]
+> 在单块V100 GPU上完整训练约需1小时。
-在单块V100 GPU上完整训练约需1小时。
-</Tip>
 启动脚本前还有最后一步。如果想实时观察训练过程，可以定期保存生成图像。在训练命令中添加以下参数：

--- a/docs/source/zh/training/wuerstchen.md
+++ b/docs/source/zh/training/wuerstchen.md
@@ -33,11 +33,8 @@ cd examples/wuerstchen/text_to_image
 pip install -r requirements.txt
 ```
-<Tip>
+> [!TIP]
+> 🤗 Accelerate 是一个帮助您在多个 GPU/TPU 上或使用混合精度进行训练的库。它会根据您的硬件和环境自动配置训练设置。查看 🤗 Accelerate [快速入门](https://huggingface.co/docs/accelerate/quicktour) 以了解更多信息。
-🤗 Accelerate 是一个帮助您在多个 GPU/TPU 上或使用混合精度进行训练的库。它会根据您的硬件和环境自动配置训练设置。查看 🤗 Accelerate [快速入门](https://huggingface.co/docs/accelerate/quicktour) 以了解更多信息。
-</Tip>
 初始化一个 🤗 Accelerate 环境：
@@ -61,11 +58,8 @@ write_basic_config()
 最后，如果您想在自己的数据集上训练模型，请查看 [创建训练数据集](create_dataset) 指南，了解如何创建与训练脚本兼容的数据集。
-<Tip>
+> [!TIP]
+> 以下部分重点介绍了训练脚本中对于理解如何修改它很重要的部分，但并未涵盖 [脚本](https://github.com/huggingface/diffusers/blob/main/examples/wuerstchen/text_to_image/train_text_to_image_prior.py) 的详细信息。如果您有兴趣了解更多，请随时阅读脚本，并告诉我们您是否有任何问题或疑虑。
-以下部分重点介绍了训练脚本中对于理解如何修改它很重要的部分，但并未涵盖 [脚本](https://github.com/huggingface/diffusers/blob/main/examples/wuerstchen/text_to_image/train_text_to_image_prior.py) 的详细信息。如果您有兴趣了解更多，请随时阅读脚本，并告诉我们您是否有任何问题或疑虑。
-</Tip>
 ## 脚本参数
@@ -134,11 +128,8 @@ pred_noise = prior(noisy_latents, timesteps, prompt_embeds)
 设置`DATASET_NAME`环境变量为Hub中的数据集名称。本指南使用[Naruto BLIP captions](https://huggingface.co/datasets/lambdalabs/naruto-blip-captions)数据集，但您也可以创建和训练自己的数据集（参见[创建用于训练的数据集](create_dataset)指南）。
-<Tip>
+> [!TIP]
+> 要使用Weights & Biases监控训练进度，请在训练命令中添加`--report_to=wandb`参数。您还需要在训练命令中添加`--validation_prompt`以跟踪结果。这对于调试模型和查看中间结果非常有用。
-要使用Weights & Biases监控训练进度，请在训练命令中添加`--report_to=wandb`参数。您还需要在训练命令中添加`--validation_prompt`以跟踪结果。这对于调试模型和查看中间结果非常有用。
-</Tip>
 ```bash
 export DATASET_NAME="lambdalabs/naruto-blip-captions"

--- a/examples/community/matryoshka.py
+++ b/examples/community/matryoshka.py
@@ -1475,11 +1475,8 @@ class MatryoshkaFusedAttnProcessor2_0:
    fused projection layers. For self-attention modules, all projection matrices (i.e., query, key, value) are fused.
    For cross-attention modules, key and value projection matrices are fused.
-    <Tip warning={true}>
+    > [!WARNING]
+    > This API is currently 🧪 experimental in nature and can change in future.
-    This API is currently 🧪 experimental in nature and can change in future.
-    </Tip>
    """
    def __init__(self):
@@ -2696,11 +2693,8 @@ class MatryoshkaUNet2DConditionModel(
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value)
        are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING]
+        > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        self.original_attn_processors = None
@@ -2719,11 +2713,8 @@ class MatryoshkaUNet2DConditionModel(
    def unfuse_qkv_projections(self):
        """Disables the fused QKV projection if enabled.
-        <Tip warning={true}>
+        > [!WARNING]
+        > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        if self.original_attn_processors is not None:

--- a/examples/community/pipeline_stable_diffusion_boxdiff.py
+++ b/examples/community/pipeline_stable_diffusion_boxdiff.py
@@ -948,11 +948,8 @@ class StableDiffusionBoxDiffPipeline(
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query,
        key, value) are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING]
+        > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        Args:
            unet (`bool`, defaults to `True`): To apply fusion on the UNet.
@@ -978,11 +975,8 @@ class StableDiffusionBoxDiffPipeline(
    def unfuse_qkv_projections(self, unet: bool = True, vae: bool = True):
        """Disable QKV projection fusion if enabled.
-        <Tip warning={true}>
+        > [!WARNING]
+        > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        Args:
            unet (`bool`, defaults to `True`): To apply fusion on the UNet.

--- a/examples/community/pipeline_stable_diffusion_pag.py
+++ b/examples/community/pipeline_stable_diffusion_pag.py
@@ -940,9 +940,8 @@ class StableDiffusionPAGPipeline(
        """
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query,
        key, value) are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING]
-        This API is 🧪 experimental.
+        > This API is 🧪 experimental.
-        </Tip>
        Args:
            unet (`bool`, defaults to `True`): To apply fusion on the UNet.
            vae (`bool`, defaults to `True`): To apply fusion on the VAE.
@@ -966,9 +965,8 @@ class StableDiffusionPAGPipeline(
    # Copied from diffusers.pipelines.stable_diffusion_xl.pipeline_stable_diffusion_xl.StableDiffusionXLPipeline.unfuse_qkv_projections
    def unfuse_qkv_projections(self, unet: bool = True, vae: bool = True):
        """Disable QKV projection fusion if enabled.
-        <Tip warning={true}>
+        > [!WARNING]
-        This API is 🧪 experimental.
+        > This API is 🧪 experimental.
-        </Tip>
        Args:
            unet (`bool`, defaults to `True`): To apply fusion on the UNet.
            vae (`bool`, defaults to `True`): To apply fusion on the VAE.

--- a/examples/model_search/pipeline_easy.py
+++ b/examples/model_search/pipeline_easy.py
@@ -1246,12 +1246,9 @@ class EasyPipelineForText2Image(AutoPipelineForText2Image):
                Load weights from a specified variant filename such as `"fp16"` or `"ema"`. This is ignored when
                loading `from_flax`.
-        <Tip>
+        > [!TIP]
+        > To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
-        To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
+        > `hf auth login`.
-        `hf auth login`.
-        </Tip>
        Examples:
@@ -1355,12 +1352,9 @@ class EasyPipelineForText2Image(AutoPipelineForText2Image):
                class). The overwritten components are passed directly to the pipelines `__init__` method. See example
                below for more information.
-        <Tip>
+        > [!TIP]
+        > To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
-        To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
+        > `hf auth login`.
-        `hf auth login`.
-        </Tip>
        Examples:
@@ -1504,12 +1498,9 @@ class EasyPipelineForImage2Image(AutoPipelineForImage2Image):
                Load weights from a specified variant filename such as `"fp16"` or `"ema"`. This is ignored when
                loading `from_flax`.
-        <Tip>
+        > [!TIP]
+        > To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
-        To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
+        > `hf auth login`.
-        `hf auth login`.
-        </Tip>
        Examples:
@@ -1614,12 +1605,9 @@ class EasyPipelineForImage2Image(AutoPipelineForImage2Image):
                class). The overwritten components are passed directly to the pipelines `__init__` method. See example
                below for more information.
-        <Tip>
+        > [!TIP]
+        > To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
-        To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
+        > `hf auth login`.
-        `hf auth login`.
-        </Tip>
        Examples:
@@ -1763,12 +1751,9 @@ class EasyPipelineForInpainting(AutoPipelineForInpainting):
                Load weights from a specified variant filename such as `"fp16"` or `"ema"`. This is ignored when
                loading `from_flax`.
-        <Tip>
+        > [!TIP]
+        > To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
-        To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
+        > `hf auth login
-        `hf auth login
-        </Tip>
        Examples:
@@ -1872,12 +1857,9 @@ class EasyPipelineForInpainting(AutoPipelineForInpainting):
                class). The overwritten components are passed directly to the pipelines `__init__` method. See example
                below for more information.
-        <Tip>
+        > [!TIP]
+        > To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
-        To use private or [gated](https://huggingface.co/docs/hub/models-gated#gated-models) models, log-in with
+        > `hf auth login
-        `hf auth login
-        </Tip>
        Examples:

--- a/src/diffusers/guiders/guider_utils.py
+++ b/src/diffusers/guiders/guider_utils.py
@@ -247,15 +247,11 @@ class BaseGuidance(ConfigMixin, PushToHubMixin):
                The specific model version to use. It can be a branch name, a tag name, a commit id, or any identifier
                allowed by Git.
-        <Tip>
+        > [!TIP] > To use private or [gated models](https://huggingface.co/docs/hub/models-gated#gated-models), log-in
+        with `hf > auth login`. You can also activate the special >
-        To use private or [gated models](https://huggingface.co/docs/hub/models-gated#gated-models), log-in with `hf
+        ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use this method in a >
-        auth login`. You can also activate the special
-        ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use this method in a
        firewalled environment.
-        </Tip>
        """
        config, kwargs, commit_hash = cls.load_config(
            pretrained_model_name_or_path=pretrained_model_name_or_path,

--- a/src/diffusers/loaders/lora_base.py
+++ b/src/diffusers/loaders/lora_base.py
@@ -544,11 +544,7 @@ class LoraBaseMixin:
        r"""
        Fuses the LoRA parameters into the original parameters of the corresponding blocks.
-        <Tip warning={true}>
+        > [!WARNING] > This is an experimental API.
-        This is an experimental API.
-        </Tip>
        Args:
            components: (`List[str]`): List of LoRA-injectable components to fuse the LoRAs into.
@@ -628,11 +624,7 @@ class LoraBaseMixin:
        Reverses the effect of
        [`pipe.fuse_lora()`](https://huggingface.co/docs/diffusers/main/en/api/loaders#diffusers.loaders.LoraBaseMixin.fuse_lora).
-        <Tip warning={true}>
+        > [!WARNING] > This is an experimental API.
-        This is an experimental API.
-        </Tip>
        Args:
            components (`List[str]`): List of LoRA-injectable components to unfuse LoRA from.

--- a/src/diffusers/loaders/lora_pipeline.py
+++ b/src/diffusers/loaders/lora_pipeline.py
@@ -246,13 +246,8 @@ class StableDiffusionLoraLoaderMixin(LoraBaseMixin):
        r"""
        Return state dict for lora weights and the network alphas.
-        <Tip warning={true}>
+        > [!WARNING] > We support loading A1111 formatted LoRA checkpoints in a limited capacity. > > This function is
+        experimental and might change in the future.
-        We support loading A1111 formatted LoRA checkpoints in a limited capacity.
-        This function is experimental and might change in the future.
-        </Tip>
        Parameters:
            pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`):
@@ -545,11 +540,7 @@ class StableDiffusionLoraLoaderMixin(LoraBaseMixin):
        r"""
        Fuses the LoRA parameters into the original parameters of the corresponding blocks.
-        <Tip warning={true}>
+        > [!WARNING] > This is an experimental API.
-        This is an experimental API.
-        </Tip>
        Args:
            components: (`List[str]`): List of LoRA-injectable components to fuse the LoRAs into.
@@ -586,11 +577,7 @@ class StableDiffusionLoraLoaderMixin(LoraBaseMixin):
        Reverses the effect of
        [`pipe.fuse_lora()`](https://huggingface.co/docs/diffusers/main/en/api/loaders#diffusers.loaders.LoraBaseMixin.fuse_lora).
-        <Tip warning={true}>
+        > [!WARNING] > This is an experimental API.
-        This is an experimental API.
-        </Tip>
        Args:
            components (`List[str]`): List of LoRA-injectable components to unfuse LoRA from.
@@ -698,13 +685,8 @@ class StableDiffusionXLLoraLoaderMixin(LoraBaseMixin):
        r"""
        Return state dict for lora weights and the network alphas.
-        <Tip warning={true}>
+        > [!WARNING] > We support loading A1111 formatted LoRA checkpoints in a limited capacity. > > This function is
+        experimental and might change in the future.
-        We support loading A1111 formatted LoRA checkpoints in a limited capacity.
-        This function is experimental and might change in the future.
-        </Tip>
        Parameters:
            pretrained_model_name_or_path_or_dict (`str` or `os.PathLike` or `dict`):
@@ -2007,11 +1989,7 @@ class FluxLoraLoaderMixin(LoraBaseMixin):
        Reverses the effect of
        [`pipe.fuse_lora()`](https://huggingface.co/docs/diffusers/main/en/api/loaders#diffusers.loaders.LoraBaseMixin.fuse_lora).
-        <Tip warning={true}>
+        > [!WARNING] > This is an experimental API.
-        This is an experimental API.
-        </Tip>
        Args:
            components (`List[str]`): List of LoRA-injectable components to unfuse LoRA from.

--- a/src/diffusers/models/attention.py
+++ b/src/diffusers/models/attention.py
@@ -111,11 +111,7 @@ class AttentionMixin:
    def unfuse_qkv_projections(self):
        """Disables the fused QKV projection if enabled.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        for module in self.modules():
            if isinstance(module, AttentionModuleMixin):

--- a/src/diffusers/models/attention_processor.py
+++ b/src/diffusers/models/attention_processor.py
@@ -3669,11 +3669,7 @@ class FusedAttnProcessor2_0:
    fused projection layers. For self-attention modules, all projection matrices (i.e., query, key, value) are fused.
    For cross-attention modules, key and value projection matrices are fused.
-    <Tip warning={true}>
+    > [!WARNING] > This API is currently 🧪 experimental in nature and can change in future.
-    This API is currently 🧪 experimental in nature and can change in future.
-    </Tip>
    """
    def __init__(self):

--- a/src/diffusers/models/auto_model.py
+++ b/src/diffusers/models/auto_model.py
@@ -118,15 +118,11 @@ class AutoModel(ConfigMixin):
            trust_remote_cocde (`bool`, *optional*, defaults to `False`):
                Whether to trust remote code
-        <Tip>
+        > [!TIP] > To use private or [gated models](https://huggingface.co/docs/hub/models-gated#gated-models), log-in
+        with `hf > auth login`. You can also activate the special >
-        To use private or [gated models](https://huggingface.co/docs/hub/models-gated#gated-models), log-in with `hf
+        ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use this method in a >
-        auth login`. You can also activate the special
-        ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use this method in a
        firewalled environment.
-        </Tip>
        Example:
        ```py

--- a/src/diffusers/models/autoencoders/autoencoder_kl.py
+++ b/src/diffusers/models/autoencoders/autoencoder_kl.py
@@ -532,11 +532,7 @@ class AutoencoderKL(ModelMixin, ConfigMixin, FromOriginalModelMixin, PeftAdapter
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value)
        are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        self.original_attn_processors = None
@@ -556,11 +552,7 @@ class AutoencoderKL(ModelMixin, ConfigMixin, FromOriginalModelMixin, PeftAdapter
    def unfuse_qkv_projections(self):
        """Disables the fused QKV projection if enabled.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        if self.original_attn_processors is not None:

--- a/src/diffusers/models/controlnets/controlnet_sd3.py
+++ b/src/diffusers/models/controlnets/controlnet_sd3.py
@@ -270,11 +270,7 @@ class SD3ControlNetModel(ModelMixin, ConfigMixin, PeftAdapterMixin, FromOriginal
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value)
        are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        self.original_attn_processors = None
@@ -294,11 +290,7 @@ class SD3ControlNetModel(ModelMixin, ConfigMixin, PeftAdapterMixin, FromOriginal
    def unfuse_qkv_projections(self):
        """Disables the fused QKV projection if enabled.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        if self.original_attn_processors is not None:

--- a/src/diffusers/models/controlnets/controlnet_xs.py
+++ b/src/diffusers/models/controlnets/controlnet_xs.py
@@ -980,11 +980,7 @@ class UNetControlNetXSModel(ModelMixin, ConfigMixin):
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value)
        are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        self.original_attn_processors = None
@@ -1004,11 +1000,7 @@ class UNetControlNetXSModel(ModelMixin, ConfigMixin):
    def unfuse_qkv_projections(self):
        """Disables the fused QKV projection if enabled.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        if self.original_attn_processors is not None:

--- a/src/diffusers/models/modeling_flax_utils.py
+++ b/src/diffusers/models/modeling_flax_utils.py
@@ -227,15 +227,9 @@ class FlaxModelMixin(PushToHubMixin):
                This can be used to enable mixed-precision training or half-precision inference on GPUs or TPUs. If
                specified, all the computation will be performed with the given `dtype`.
-                <Tip>
+                > [!TIP] > This only specifies the dtype of the *computation* and does not influence the dtype of model
+                > parameters. > > If you wish to change the dtype of the model parameters, see
-                This only specifies the dtype of the *computation* and does not influence the dtype of model
+                [`~FlaxModelMixin.to_fp16`] and > [`~FlaxModelMixin.to_bf16`].
-                parameters.
-                If you wish to change the dtype of the model parameters, see [`~FlaxModelMixin.to_fp16`] and
-                [`~FlaxModelMixin.to_bf16`].
-                </Tip>
            model_args (sequence of positional arguments, *optional*):
                All remaining positional arguments are passed to the underlying model's `__init__` method.

--- a/src/diffusers/models/modeling_utils.py
+++ b/src/diffusers/models/modeling_utils.py
@@ -403,12 +403,8 @@ class ModelMixin(torch.nn.Module, PushToHubMixin):
        When this option is enabled, you should observe lower GPU memory usage and a potential speed up during
        inference. Speed up during training is not guaranteed.
-        <Tip warning={true}>
+        > [!WARNING] > ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient
+        attention takes > precedent.
-        ⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient attention takes
-        precedent.
-        </Tip>
        Parameters:
            attention_op (`Callable`, *optional*):
@@ -917,15 +913,11 @@ class ModelMixin(torch.nn.Module, PushToHubMixin):
                Whether to disable mmap when loading a Safetensors model. This option can perform better when the model
                is on a network mount or hard drive, which may not handle the seeky-ness of mmap very well.
-        <Tip>
+        > [!TIP] > To use private or [gated models](https://huggingface.co/docs/hub/models-gated#gated-models), log-in
+        with `hf > auth login`. You can also activate the special >
-        To use private or [gated models](https://huggingface.co/docs/hub/models-gated#gated-models), log-in with `hf
+        ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use this method in a >
-        auth login`. You can also activate the special
-        ["offline-mode"](https://huggingface.co/diffusers/installation.html#offline-mode) to use this method in a
        firewalled environment.
-        </Tip>
        Example:
        ```py

--- a/src/diffusers/models/transformers/auraflow_transformer_2d.py
+++ b/src/diffusers/models/transformers/auraflow_transformer_2d.py
@@ -431,11 +431,7 @@ class AuraFlowTransformer2DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, From
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value)
        are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        self.original_attn_processors = None
@@ -455,11 +451,7 @@ class AuraFlowTransformer2DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, From
    def unfuse_qkv_projections(self):
        """Disables the fused QKV projection if enabled.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        if self.original_attn_processors is not None:

--- a/src/diffusers/models/transformers/cogvideox_transformer_3d.py
+++ b/src/diffusers/models/transformers/cogvideox_transformer_3d.py
@@ -397,11 +397,7 @@ class CogVideoXTransformer3DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, Cac
        Enables fused QKV projections. For self-attention modules, all projection matrices (i.e., query, key, value)
        are fused. For cross-attention modules, key and value projection matrices are fused.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        self.original_attn_processors = None
@@ -421,11 +417,7 @@ class CogVideoXTransformer3DModel(ModelMixin, ConfigMixin, PeftAdapterMixin, Cac
    def unfuse_qkv_projections(self):
        """Disables the fused QKV projection if enabled.
-        <Tip warning={true}>
+        > [!WARNING] > This API is 🧪 experimental.
-        This API is 🧪 experimental.
-        </Tip>
        """
        if self.original_attn_processors is not None: