[Refactor] Better align `from_single_file` logic with `from_pretrained` (#7496)

* refactor unet single file loading a bit. * retrieve the unet from create_diffusers_unet_model_from_ldm * update * update * updae * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * tests * update * update * update * Update docs/source/en/api/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * Update docs/source/en/api/loaders/single_file.md Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/loaders/single_file.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

[Refactor] Better align `from_single_file` logic with `from_pretrained` (#7496)
* refactor unet single file loading a bit. * retrieve the unet from create_diffusers_unet_model_from_ldm * update * update * updae * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * tests * update * update * update * Update docs/source/en/api/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * Update docs/source/en/api/loaders/single_file.md Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update src/diffusers/loaders/single_file.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Update docs/source/en/api/loaders/single_file.md Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update --------- Co-authored-by: sayakpaul <spsayakpaul@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>
cb0f3b49 · Dhruv Nair · GitHub · caf9e985 · cb0f3b49 · cb0f3b49
Unverified Commit cb0f3b49 authored May 09, 2024 by Dhruv Nair Committed by GitHub May 09, 2024
20 changed files
--- a/tests/pipelines/controlnet/test_controlnet_sdxl.py
+++ b/tests/pipelines/controlnet/test_controlnet_sdxl.py
@@ -37,7 +37,6 @@ from diffusers.utils.import_utils import is_xformers_available
 from diffusers.utils.testing_utils import (
    enable_full_determinism,
    load_image,
-    numpy_cosine_similarity_distance,
    require_torch_gpu,
    slow,
    torch_device,
@@ -949,89 +948,6 @@ class ControlNetSDXLPipelineSlowTests(unittest.TestCase):
        expected_image = np.array([0.4399, 0.5112, 0.5478, 0.4314, 0.472, 0.4823, 0.4647, 0.4957, 0.4853])
        assert np.allclose(original_image, expected_image, atol=1e-04)
-    def test_download_ckpt_diff_format_is_same(self):
-        controlnet = ControlNetModel.from_pretrained("diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16)
-        single_file_url = (
-            "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors"
-        )
-        pipe_single_file = StableDiffusionXLControlNetPipeline.from_single_file(
-            single_file_url, controlnet=controlnet, torch_dtype=torch.float16
-        )
-        pipe_single_file.unet.set_default_attn_processor()
-        pipe_single_file.enable_model_cpu_offload()
-        pipe_single_file.set_progress_bar_config(disable=None)
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        prompt = "Stormtrooper's lecture"
-        image = load_image(
-            "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/stormtrooper_depth.png"
-        )
-        single_file_images = pipe_single_file(
-            prompt, image=image, generator=generator, output_type="np", num_inference_steps=2
-        ).images
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
-            "stabilityai/stable-diffusion-xl-base-1.0", controlnet=controlnet, torch_dtype=torch.float16
-        )
-        pipe.unet.set_default_attn_processor()
-        pipe.enable_model_cpu_offload()
-        images = pipe(prompt, image=image, generator=generator, output_type="np", num_inference_steps=2).images
-        assert images[0].shape == (512, 512, 3)
-        assert single_file_images[0].shape == (512, 512, 3)
-        max_diff = numpy_cosine_similarity_distance(images[0].flatten(), single_file_images[0].flatten())
-        assert max_diff < 5e-2
-    def test_single_file_component_configs(self):
-        controlnet = ControlNetModel.from_pretrained(
-            "diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16, variant="fp16"
-        )
-        pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
-            "stabilityai/stable-diffusion-xl-base-1.0",
-            variant="fp16",
-            controlnet=controlnet,
-            torch_dtype=torch.float16,
-        )
-        single_file_url = (
-            "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors"
-        )
-        single_file_pipe = StableDiffusionXLControlNetPipeline.from_single_file(
-            single_file_url, controlnet=controlnet, torch_dtype=torch.float16
-        )
-        for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder.config.to_dict()[param_name] == param_value
-        for param_name, param_value in single_file_pipe.text_encoder_2.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder_2.config.to_dict()[param_name] == param_value
-        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "architectures", "_use_default_values"]
-        for param_name, param_value in single_file_pipe.unet.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            # Upcast attention might be set to None in a config file, which is incorrect. It should default to False in the model
-            if param_name == "upcast_attention" and pipe.unet.config[param_name] is None:
-                pipe.unet.config[param_name] = False
-            assert (
-                pipe.unet.config[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.vae.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.vae.config[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
 class StableDiffusionSSD1BControlNetPipelineFastTests(StableDiffusionXLControlNetPipelineFastTests):
    def test_controlnet_sdxl_guess(self):

--- a/tests/pipelines/stable_diffusion/test_stable_diffusion.py
+++ b/tests/pipelines/stable_diffusion/test_stable_diffusion.py
@@ -42,7 +42,6 @@ from diffusers import (
    UNet2DConditionModel,
    logging,
 )
-from diffusers.models.attention_processor import AttnProcessor
 from diffusers.utils.testing_utils import (
    CaptureLogger,
    enable_full_determinism,
@@ -1284,62 +1283,6 @@ class StableDiffusionPipelineCkptTests(unittest.TestCase):
        assert image_out.shape == (512, 512, 3)
-    def test_download_ckpt_diff_format_is_same(self):
-        ckpt_path = "https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors"
-        sf_pipe = StableDiffusionPipeline.from_single_file(ckpt_path)
-        sf_pipe.scheduler = DDIMScheduler.from_config(sf_pipe.scheduler.config)
-        sf_pipe.unet.set_attn_processor(AttnProcessor())
-        sf_pipe.to("cuda")
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image_single_file = sf_pipe("a turtle", num_inference_steps=2, generator=generator, output_type="np").images[0]
-        pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.unet.set_attn_processor(AttnProcessor())
-        pipe.to("cuda")
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image = pipe("a turtle", num_inference_steps=2, generator=generator, output_type="np").images[0]
-        max_diff = numpy_cosine_similarity_distance(image.flatten(), image_single_file.flatten())
-        assert max_diff < 1e-3
-    def test_single_file_component_configs(self):
-        pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
-        ckpt_path = "https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors"
-        single_file_pipe = StableDiffusionPipeline.from_single_file(ckpt_path, load_safety_checker=True)
-        for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder.config.to_dict()[param_name] == param_value
-        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "architectures", "_use_default_values"]
-        for param_name, param_value in single_file_pipe.unet.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.unet.config[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.vae.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.vae.config[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.safety_checker.config.to_dict().items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.safety_checker.config.to_dict()[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
 @nightly
 @require_torch_gpu

--- a/tests/pipelines/stable_diffusion/test_stable_diffusion_inpaint.py
+++ b/tests/pipelines/stable_diffusion/test_stable_diffusion_inpaint.py
@@ -36,7 +36,6 @@ from diffusers import (
    StableDiffusionInpaintPipeline,
    UNet2DConditionModel,
 )
-from diffusers.models.attention_processor import AttnProcessor
 from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint import prepare_mask_and_masked_image
 from diffusers.utils.testing_utils import (
    enable_full_determinism,
@@ -44,7 +43,6 @@ from diffusers.utils.testing_utils import (
    load_image,
    load_numpy,
    nightly,
-    numpy_cosine_similarity_distance,
    require_python39_or_higher,
    require_torch_2,
    require_torch_gpu,
@@ -786,77 +784,6 @@ class StableDiffusionInpaintPipelineSlowTests(unittest.TestCase):
        expected_slice = np.array([0.3757, 0.3875, 0.4445, 0.4353, 0.3780, 0.4513, 0.3965, 0.3984, 0.4362])
        assert np.abs(expected_slice - image_slice).max() < 1e-3
-    def test_download_local(self):
-        filename = hf_hub_download("runwayml/stable-diffusion-inpainting", filename="sd-v1-5-inpainting.ckpt")
-        pipe = StableDiffusionInpaintPipeline.from_single_file(filename, torch_dtype=torch.float16)
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.to("cuda")
-        inputs = self.get_inputs(torch_device)
-        inputs["num_inference_steps"] = 1
-        image_out = pipe(**inputs).images[0]
-        assert image_out.shape == (512, 512, 3)
-    def test_download_ckpt_diff_format_is_same(self):
-        ckpt_path = "https://huggingface.co/runwayml/stable-diffusion-inpainting/blob/main/sd-v1-5-inpainting.ckpt"
-        pipe = StableDiffusionInpaintPipeline.from_single_file(ckpt_path)
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.unet.set_attn_processor(AttnProcessor())
-        pipe.to("cuda")
-        inputs = self.get_inputs(torch_device)
-        inputs["num_inference_steps"] = 5
-        image_ckpt = pipe(**inputs).images[0]
-        pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting")
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.unet.set_attn_processor(AttnProcessor())
-        pipe.to("cuda")
-        inputs = self.get_inputs(torch_device)
-        inputs["num_inference_steps"] = 5
-        image = pipe(**inputs).images[0]
-        max_diff = numpy_cosine_similarity_distance(image.flatten(), image_ckpt.flatten())
-        assert max_diff < 1e-4
-    def test_single_file_component_configs(self):
-        pipe = StableDiffusionInpaintPipeline.from_pretrained("runwayml/stable-diffusion-inpainting", variant="fp16")
-        ckpt_path = "https://huggingface.co/runwayml/stable-diffusion-inpainting/blob/main/sd-v1-5-inpainting.ckpt"
-        single_file_pipe = StableDiffusionInpaintPipeline.from_single_file(ckpt_path, load_safety_checker=True)
-        for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder.config.to_dict()[param_name] == param_value
-        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "architectures", "_use_default_values"]
-        for param_name, param_value in single_file_pipe.unet.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.unet.config[param_name] == param_value
-            ), f"{param_name} is differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.vae.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.vae.config[param_name] == param_value
-            ), f"{param_name} is differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.safety_checker.config.to_dict().items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.safety_checker.config.to_dict()[param_name] == param_value
-            ), f"{param_name} is differs between single file loading and pretrained loading"
 @slow
 @require_torch_gpu
@@ -1082,9 +1009,6 @@ class StableDiffusionInpaintPipelineAsymmetricAutoencoderKLSlowTests(unittest.Te
        assert image_out.shape == (512, 512, 3)
-    def test_download_ckpt_diff_format_is_same(self):
-        pass
 @nightly
 @require_torch_gpu

--- a/tests/pipelines/stable_diffusion_2/test_stable_diffusion_upscale.py
+++ b/tests/pipelines/stable_diffusion_2/test_stable_diffusion_upscale.py
@@ -29,7 +29,6 @@ from diffusers.utils.testing_utils import (
    floats_tensor,
    load_image,
    load_numpy,
-    numpy_cosine_similarity_distance,
    require_torch_gpu,
    slow,
    torch_device,
@@ -492,73 +491,3 @@ class StableDiffusionUpscalePipelineIntegrationTests(unittest.TestCase):
        mem_bytes = torch.cuda.max_memory_allocated()
        # make sure that less than 2.9 GB is allocated
        assert mem_bytes < 2.9 * 10**9
-    def test_download_ckpt_diff_format_is_same(self):
-        image = load_image(
-            "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
-            "/sd2-upscale/low_res_cat.png"
-        )
-        prompt = "a cat sitting on a park bench"
-        model_id = "stabilityai/stable-diffusion-x4-upscaler"
-        pipe = StableDiffusionUpscalePipeline.from_pretrained(model_id)
-        pipe.enable_model_cpu_offload()
-        generator = torch.Generator("cpu").manual_seed(0)
-        output = pipe(prompt=prompt, image=image, generator=generator, output_type="np", num_inference_steps=3)
-        image_from_pretrained = output.images[0]
-        single_file_path = (
-            "https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.safetensors"
-        )
-        pipe_from_single_file = StableDiffusionUpscalePipeline.from_single_file(single_file_path)
-        pipe_from_single_file.enable_model_cpu_offload()
-        generator = torch.Generator("cpu").manual_seed(0)
-        output_from_single_file = pipe_from_single_file(
-            prompt=prompt, image=image, generator=generator, output_type="np", num_inference_steps=3
-        )
-        image_from_single_file = output_from_single_file.images[0]
-        assert image_from_pretrained.shape == (512, 512, 3)
-        assert image_from_single_file.shape == (512, 512, 3)
-        assert (
-            numpy_cosine_similarity_distance(image_from_pretrained.flatten(), image_from_single_file.flatten()) < 1e-3
-        )
-    def test_single_file_component_configs(self):
-        pipe = StableDiffusionUpscalePipeline.from_pretrained(
-            "stabilityai/stable-diffusion-x4-upscaler", variant="fp16"
-        )
-        ckpt_path = (
-            "https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.safetensors"
-        )
-        single_file_pipe = StableDiffusionUpscalePipeline.from_single_file(ckpt_path, load_safety_checker=True)
-        for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder.config.to_dict()[param_name] == param_value
-        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "architectures", "_use_default_values"]
-        for param_name, param_value in single_file_pipe.unet.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.unet.config[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.vae.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.vae.config[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.safety_checker.config.to_dict().items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.safety_checker.config.to_dict()[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
--- a/tests/pipelines/stable_diffusion_2/test_stable_diffusion_v_pred.py
+++ b/tests/pipelines/stable_diffusion_2/test_stable_diffusion_v_pred.py
@@ -30,7 +30,6 @@ from diffusers import (
    StableDiffusionPipeline,
    UNet2DConditionModel,
 )
-from diffusers.models.attention_processor import AttnProcessor
 from diffusers.utils.testing_utils import (
    enable_full_determinism,
    load_numpy,
@@ -473,30 +472,6 @@ class StableDiffusion2VPredictionPipelineIntegrationTests(unittest.TestCase):
        assert image_out.shape == (768, 768, 3)
-    def test_download_ckpt_diff_format_is_same(self):
-        single_file_path = (
-            "https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/v2-1_768-ema-pruned.safetensors"
-        )
-        pipe_single = StableDiffusionPipeline.from_single_file(single_file_path)
-        pipe_single.scheduler = DDIMScheduler.from_config(pipe_single.scheduler.config)
-        pipe_single.unet.set_attn_processor(AttnProcessor())
-        pipe_single.enable_model_cpu_offload()
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image_ckpt = pipe_single("a turtle", num_inference_steps=2, generator=generator, output_type="np").images[0]
-        pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1")
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.unet.set_attn_processor(AttnProcessor())
-        pipe.enable_model_cpu_offload()
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image = pipe("a turtle", num_inference_steps=2, generator=generator, output_type="np").images[0]
-        max_diff = numpy_cosine_similarity_distance(image.flatten(), image_ckpt.flatten())
-        assert max_diff < 1e-3
    def test_stable_diffusion_text2img_intermediate_state_v_pred(self):
        number_of_steps = 0

--- a/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl.py
+++ b/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl.py
@@ -1046,68 +1046,3 @@ class StableDiffusionXLPipelineIntegrationTests(unittest.TestCase):
        max_diff = numpy_cosine_similarity_distance(image.flatten(), expected_image.flatten())
        assert max_diff < 1e-2
-    def test_download_ckpt_diff_format_is_same(self):
-        ckpt_path = (
-            "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors"
-        )
-        pipe = StableDiffusionXLPipeline.from_single_file(ckpt_path, torch_dtype=torch.float16)
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.unet.set_default_attn_processor()
-        pipe.enable_model_cpu_offload()
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image_ckpt = pipe("a turtle", num_inference_steps=2, generator=generator, output_type="np").images[0]
-        pipe = StableDiffusionXLPipeline.from_pretrained(
-            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
-        )
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.unet.set_default_attn_processor()
-        pipe.enable_model_cpu_offload()
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image = pipe("a turtle", num_inference_steps=2, generator=generator, output_type="np").images[0]
-        max_diff = numpy_cosine_similarity_distance(image.flatten(), image_ckpt.flatten())
-        assert max_diff < 6e-3
-    def test_single_file_component_configs(self):
-        pipe = StableDiffusionXLPipeline.from_pretrained(
-            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
-        )
-        ckpt_path = (
-            "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors"
-        )
-        single_file_pipe = StableDiffusionXLPipeline.from_single_file(
-            ckpt_path, variant="fp16", torch_dtype=torch.float16
-        )
-        for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder.config.to_dict()[param_name] == param_value
-        for param_name, param_value in single_file_pipe.text_encoder_2.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder_2.config.to_dict()[param_name] == param_value
-        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "architectures", "_use_default_values"]
-        for param_name, param_value in single_file_pipe.unet.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            if param_name == "upcast_attention" and pipe.unet.config[param_name] is None:
-                pipe.unet.config[param_name] = False
-            assert (
-                pipe.unet.config[param_name] == param_value
-            ), f"{param_name} is differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.vae.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.vae.config[param_name] == param_value
-            ), f"{param_name} is differs between single file loading and pretrained loading"
--- a/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_adapter.py
+++ b/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_adapter.py
@@ -13,7 +13,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import gc
 import random
 import unittest
@@ -32,13 +31,10 @@ from diffusers import (
    T2IAdapter,
    UNet2DConditionModel,
 )
-from diffusers.utils import load_image, logging
+from diffusers.utils import logging
 from diffusers.utils.testing_utils import (
    enable_full_determinism,
    floats_tensor,
-    numpy_cosine_similarity_distance,
-    require_torch_gpu,
-    slow,
    torch_device,
 )
@@ -678,54 +674,3 @@ class StableDiffusionXLMultiAdapterPipelineFastTests(
        print(",".join(debug))
        assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2
-@slow
-@require_torch_gpu
-class AdapterSDXLPipelineSlowTests(unittest.TestCase):
-    def setUp(self):
-        super().setUp()
-        gc.collect()
-        torch.cuda.empty_cache()
-    def tearDown(self):
-        super().tearDown()
-        gc.collect()
-        torch.cuda.empty_cache()
-    def test_download_ckpt_diff_format_is_same(self):
-        ckpt_path = (
-            "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors"
-        )
-        adapter = T2IAdapter.from_pretrained("TencentARC/t2i-adapter-lineart-sdxl-1.0", torch_dtype=torch.float16)
-        prompt = "toy"
-        image = load_image(
-            "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/t2i_adapter/toy_canny.png"
-        )
-        pipe_single_file = StableDiffusionXLAdapterPipeline.from_single_file(
-            ckpt_path,
-            adapter=adapter,
-            torch_dtype=torch.float16,
-        )
-        pipe_single_file.enable_model_cpu_offload()
-        pipe_single_file.set_progress_bar_config(disable=None)
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        images_single_file = pipe_single_file(
-            prompt, image=image, generator=generator, output_type="np", num_inference_steps=3
-        ).images
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        pipe = StableDiffusionXLAdapterPipeline.from_pretrained(
-            "stabilityai/stable-diffusion-xl-base-1.0",
-            adapter=adapter,
-            torch_dtype=torch.float16,
-        )
-        pipe.enable_model_cpu_offload()
-        images = pipe(prompt, image=image, generator=generator, output_type="np", num_inference_steps=3).images
-        assert images_single_file[0].shape == (768, 512, 3)
-        assert images[0].shape == (768, 512, 3)
-        max_diff = numpy_cosine_similarity_distance(images[0].flatten(), images_single_file[0].flatten())
-        assert max_diff < 5e-3
--- a/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_img2img.py
+++ b/tests/pipelines/stable_diffusion_xl/test_stable_diffusion_xl_img2img.py
@@ -13,7 +13,6 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import gc
 import random
 import unittest
@@ -32,19 +31,15 @@ from transformers import (
 from diffusers import (
    AutoencoderKL,
    AutoencoderTiny,
-    DDIMScheduler,
    EulerDiscreteScheduler,
    LCMScheduler,
    StableDiffusionXLImg2ImgPipeline,
    UNet2DConditionModel,
 )
-from diffusers.utils import load_image
 from diffusers.utils.testing_utils import (
    enable_full_determinism,
    floats_tensor,
-    numpy_cosine_similarity_distance,
    require_torch_gpu,
-    slow,
    torch_device,
 )
@@ -781,85 +776,3 @@ class StableDiffusionXLImg2ImgRefinerOnlyPipelineFastTests(
    def test_save_load_optional_components(self):
        self._test_save_load_optional_components()
-@slow
-class StableDiffusionXLImg2ImgIntegrationTests(unittest.TestCase):
-    def setUp(self):
-        # clean up the VRAM before each test
-        super().setUp()
-        gc.collect()
-        torch.cuda.empty_cache()
-    def tearDown(self):
-        # clean up the VRAM after each test
-        super().tearDown()
-        gc.collect()
-        torch.cuda.empty_cache()
-    def test_download_ckpt_diff_format_is_same(self):
-        ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/sd_xl_refiner_1.0.safetensors"
-        init_image = load_image(
-            "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main"
-            "/stable_diffusion_img2img/sketch-mountains-input.png"
-        )
-        pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
-            "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16
-        )
-        pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
-        pipe.unet.set_default_attn_processor()
-        pipe.enable_model_cpu_offload()
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image = pipe(
-            prompt="mountains", image=init_image, num_inference_steps=5, generator=generator, output_type="np"
-        ).images[0]
-        pipe_single_file = StableDiffusionXLImg2ImgPipeline.from_single_file(ckpt_path, torch_dtype=torch.float16)
-        pipe_single_file.scheduler = DDIMScheduler.from_config(pipe_single_file.scheduler.config)
-        pipe_single_file.unet.set_default_attn_processor()
-        pipe_single_file.enable_model_cpu_offload()
-        generator = torch.Generator(device="cpu").manual_seed(0)
-        image_single_file = pipe_single_file(
-            prompt="mountains", image=init_image, num_inference_steps=5, generator=generator, output_type="np"
-        ).images[0]
-        max_diff = numpy_cosine_similarity_distance(image.flatten(), image_single_file.flatten())
-        assert max_diff < 5e-2
-    def test_single_file_component_configs(self):
-        pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
-            "stabilityai/stable-diffusion-xl-refiner-1.0",
-            torch_dtype=torch.float16,
-            variant="fp16",
-        )
-        ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/blob/main/sd_xl_refiner_1.0.safetensors"
-        single_file_pipe = StableDiffusionXLImg2ImgPipeline.from_single_file(ckpt_path, torch_dtype=torch.float16)
-        assert pipe.text_encoder is None
-        assert single_file_pipe.text_encoder is None
-        for param_name, param_value in single_file_pipe.text_encoder_2.config.to_dict().items():
-            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
-                continue
-            assert pipe.text_encoder_2.config.to_dict()[param_name] == param_value
-        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "architectures", "_use_default_values"]
-        for param_name, param_value in single_file_pipe.unet.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            if param_name == "upcast_attention" and pipe.unet.config[param_name] is None:
-                pipe.unet.config[param_name] = False
-            assert (
-                pipe.unet.config[param_name] == param_value
-            ), f"{param_name} is differs between single file loading and pretrained loading"
-        for param_name, param_value in single_file_pipe.vae.config.items():
-            if param_name in PARAMS_TO_IGNORE:
-                continue
-            assert (
-                pipe.vae.config[param_name] == param_value
-            ), f"{param_name} differs between single file loading and pretrained loading"
--- a/tests/single_file/__init__.py
+++ b/tests/single_file/__init__.py
--- a/tests/single_file/single_file_testing_utils.py
+++ b/tests/single_file/single_file_testing_utils.py
+import tempfile
+from io import BytesIO
+import requests
+import torch
+from huggingface_hub import hf_hub_download, snapshot_download
+from diffusers.models.attention_processor import AttnProcessor
+from diffusers.utils.testing_utils import (
+    numpy_cosine_similarity_distance,
+    torch_device,
+)
+def download_single_file_checkpoint(repo_id, filename, tmpdir):
+    path = hf_hub_download(repo_id, filename=filename, local_dir=tmpdir)
+    return path
+def download_original_config(config_url, tmpdir):
+    original_config_file = BytesIO(requests.get(config_url).content)
+    path = f"{tmpdir}/config.yaml"
+    with open(path, "wb") as f:
+        f.write(original_config_file.read())
+    return path
+def download_diffusers_config(repo_id, tmpdir):
+    path = snapshot_download(
+        repo_id,
+        ignore_patterns=[
+            "**/*.ckpt",
+            "*.ckpt",
+            "**/*.bin",
+            "*.bin",
+            "**/*.pt",
+            "*.pt",
+            "**/*.safetensors",
+            "*.safetensors",
+        ],
+        allow_patterns=["**/*.json", "*.json", "*.txt", "**/*.txt"],
+        local_dir=tmpdir,
+    )
+    return path
+class SDSingleFileTesterMixin:
+    def _compare_component_configs(self, pipe, single_file_pipe):
+        for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
+            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
+                continue
+            assert pipe.text_encoder.config.to_dict()[param_name] == param_value
+        PARAMS_TO_IGNORE = [
+            "torch_dtype",
+            "_name_or_path",
+            "architectures",
+            "_use_default_values",
+            "_diffusers_version",
+        ]
+        for component_name, component in single_file_pipe.components.items():
+            if component_name in single_file_pipe._optional_components:
+                continue
+            # skip testing transformer based components here
+            # skip text encoders / safety checkers since they have already been tested
+            if component_name in ["text_encoder", "tokenizer", "safety_checker", "feature_extractor"]:
+                continue
+            assert component_name in pipe.components, f"single file {component_name} not found in pretrained pipeline"
+            assert isinstance(
+                component, pipe.components[component_name].__class__
+            ), f"single file {component.__class__.__name__} and pretrained {pipe.components[component_name].__class__.__name__} are not the same"
+            for param_name, param_value in component.config.items():
+                if param_name in PARAMS_TO_IGNORE:
+                    continue
+                # Some pretrained configs will set upcast attention to None
+                # In single file loading it defaults to the value in the class __init__ which is False
+                if param_name == "upcast_attention" and pipe.components[component_name].config[param_name] is None:
+                    pipe.components[component_name].config[param_name] = param_value
+                assert (
+                    pipe.components[component_name].config[param_name] == param_value
+                ), f"single file {param_name}: {param_value} differs from pretrained {pipe.components[component_name].config[param_name]}"
+    def test_single_file_components(self, pipe=None, single_file_pipe=None):
+        single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+            self.ckpt_path, safety_checker=None
+        )
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_components_local_files_only(self, pipe=None, single_file_pipe=None):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+                local_ckpt_path, safety_checker=None, local_files_only=True
+            )
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_components_with_original_config(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        # Not possible to infer this value when original config is provided
+        # we just pass it in here otherwise this test will fail
+        upcast_attention = pipe.unet.config.upcast_attention
+        single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+            self.ckpt_path,
+            original_config=self.original_config,
+            safety_checker=None,
+            upcast_attention=upcast_attention,
+        )
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_components_with_original_config_local_files_only(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        # Not possible to infer this value when original config is provided
+        # we just pass it in here otherwise this test will fail
+        upcast_attention = pipe.unet.config.upcast_attention
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            local_original_config = download_original_config(self.original_config, tmpdir)
+            single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+                local_ckpt_path,
+                original_config=local_original_config,
+                safety_checker=None,
+                upcast_attention=upcast_attention,
+                local_files_only=True,
+            )
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_format_inference_is_same_as_pretrained(self, expected_max_diff=1e-4):
+        sf_pipe = self.pipeline_class.from_single_file(self.ckpt_path, safety_checker=None)
+        sf_pipe.unet.set_attn_processor(AttnProcessor())
+        sf_pipe.enable_model_cpu_offload()
+        inputs = self.get_inputs(torch_device)
+        image_single_file = sf_pipe(**inputs).images[0]
+        pipe = self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        pipe.unet.set_attn_processor(AttnProcessor())
+        pipe.enable_model_cpu_offload()
+        inputs = self.get_inputs(torch_device)
+        image = pipe(**inputs).images[0]
+        max_diff = numpy_cosine_similarity_distance(image.flatten(), image_single_file.flatten())
+        assert max_diff < expected_max_diff
+    def test_single_file_components_with_diffusers_config(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+            self.ckpt_path, config=self.repo_id, safety_checker=None
+        )
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_components_with_diffusers_config_local_files_only(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            local_diffusers_config = download_diffusers_config(self.repo_id, tmpdir)
+            single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+                local_ckpt_path, config=local_diffusers_config, safety_checker=None, local_files_only=True
+            )
+        self._compare_component_configs(pipe, single_file_pipe)
+class SDXLSingleFileTesterMixin:
+    def _compare_component_configs(self, pipe, single_file_pipe):
+        # Skip testing the text_encoder for Refiner Pipelines
+        if pipe.text_encoder:
+            for param_name, param_value in single_file_pipe.text_encoder.config.to_dict().items():
+                if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
+                    continue
+                assert pipe.text_encoder.config.to_dict()[param_name] == param_value
+        for param_name, param_value in single_file_pipe.text_encoder_2.config.to_dict().items():
+            if param_name in ["torch_dtype", "architectures", "_name_or_path"]:
+                continue
+            assert pipe.text_encoder_2.config.to_dict()[param_name] == param_value
+        PARAMS_TO_IGNORE = [
+            "torch_dtype",
+            "_name_or_path",
+            "architectures",
+            "_use_default_values",
+            "_diffusers_version",
+        ]
+        for component_name, component in single_file_pipe.components.items():
+            if component_name in single_file_pipe._optional_components:
+                continue
+            # skip text encoders since they have already been tested
+            if component_name in ["text_encoder", "text_encoder_2", "tokenizer", "tokenizer_2"]:
+                continue
+            # skip safety checker if it is not present in the pipeline
+            if component_name in ["safety_checker", "feature_extractor"]:
+                continue
+            assert component_name in pipe.components, f"single file {component_name} not found in pretrained pipeline"
+            assert isinstance(
+                component, pipe.components[component_name].__class__
+            ), f"single file {component.__class__.__name__} and pretrained {pipe.components[component_name].__class__.__name__} are not the same"
+            for param_name, param_value in component.config.items():
+                if param_name in PARAMS_TO_IGNORE:
+                    continue
+                # Some pretrained configs will set upcast attention to None
+                # In single file loading it defaults to the value in the class __init__ which is False
+                if param_name == "upcast_attention" and pipe.components[component_name].config[param_name] is None:
+                    pipe.components[component_name].config[param_name] = param_value
+                assert (
+                    pipe.components[component_name].config[param_name] == param_value
+                ), f"single file {param_name}: {param_value} differs from pretrained {pipe.components[component_name].config[param_name]}"
+    def test_single_file_components(self, pipe=None, single_file_pipe=None):
+        single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+            self.ckpt_path, safety_checker=None
+        )
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        self._compare_component_configs(
+            pipe,
+            single_file_pipe,
+        )
+    def test_single_file_components_local_files_only(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+                local_ckpt_path, safety_checker=None, local_files_only=True
+            )
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_components_with_original_config(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        # Not possible to infer this value when original config is provided
+        # we just pass it in here otherwise this test will fail
+        upcast_attention = pipe.unet.config.upcast_attention
+        single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+            self.ckpt_path,
+            original_config=self.original_config,
+            safety_checker=None,
+            upcast_attention=upcast_attention,
+        )
+        self._compare_component_configs(
+            pipe,
+            single_file_pipe,
+        )
+    def test_single_file_components_with_original_config_local_files_only(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        # Not possible to infer this value when original config is provided
+        # we just pass it in here otherwise this test will fail
+        upcast_attention = pipe.unet.config.upcast_attention
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            local_original_config = download_original_config(self.original_config, tmpdir)
+            single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+                local_ckpt_path,
+                original_config=local_original_config,
+                upcast_attention=upcast_attention,
+                safety_checker=None,
+                local_files_only=True,
+            )
+        self._compare_component_configs(
+            pipe,
+            single_file_pipe,
+        )
+    def test_single_file_components_with_diffusers_config(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+            self.ckpt_path, config=self.repo_id, safety_checker=None
+        )
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_components_with_diffusers_config_local_files_only(
+        self,
+        pipe=None,
+        single_file_pipe=None,
+    ):
+        pipe = pipe or self.pipeline_class.from_pretrained(self.repo_id, safety_checker=None)
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            local_diffusers_config = download_diffusers_config(self.repo_id, tmpdir)
+            single_file_pipe = single_file_pipe or self.pipeline_class.from_single_file(
+                local_ckpt_path, config=local_diffusers_config, safety_checker=None, local_files_only=True
+            )
+        self._compare_component_configs(pipe, single_file_pipe)
+    def test_single_file_format_inference_is_same_as_pretrained(self, expected_max_diff=1e-4):
+        sf_pipe = self.pipeline_class.from_single_file(self.ckpt_path, torch_dtype=torch.float16, safety_checker=None)
+        sf_pipe.unet.set_default_attn_processor()
+        sf_pipe.enable_model_cpu_offload()
+        inputs = self.get_inputs(torch_device)
+        image_single_file = sf_pipe(**inputs).images[0]
+        pipe = self.pipeline_class.from_pretrained(self.repo_id, torch_dtype=torch.float16, safety_checker=None)
+        pipe.unet.set_default_attn_processor()
+        pipe.enable_model_cpu_offload()
+        inputs = self.get_inputs(torch_device)
+        image = pipe(**inputs).images[0]
+        max_diff = numpy_cosine_similarity_distance(image.flatten(), image_single_file.flatten())
+        assert max_diff < expected_max_diff
--- a/tests/single_file/test_model_controlnet_single_file.py
+++ b/tests/single_file/test_model_controlnet_single_file.py
+# coding=utf-8
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import gc
+import unittest
+import torch
+from diffusers import (
+    ControlNetModel,
+)
+from diffusers.utils.testing_utils import (
+    enable_full_determinism,
+    require_torch_gpu,
+    slow,
+)
+enable_full_determinism()
+@slow
+@require_torch_gpu
+class ControlNetModelSingleFileTests(unittest.TestCase):
+    model_class = ControlNetModel
+    ckpt_path = "https://huggingface.co/lllyasviel/ControlNet-v1-1/blob/main/control_v11p_sd15_canny.pth"
+    repo_id = "lllyasviel/control_v11p_sd15_canny"
+    def setUp(self):
+        super().setUp()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def tearDown(self):
+        super().tearDown()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def test_single_file_components(self):
+        model = self.model_class.from_pretrained(self.repo_id)
+        model_single_file = self.model_class.from_single_file(self.ckpt_path)
+        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "_use_default_values", "_diffusers_version"]
+        for param_name, param_value in model_single_file.config.items():
+            if param_name in PARAMS_TO_IGNORE:
+                continue
+            assert (
+                model.config[param_name] == param_value
+            ), f"{param_name} differs between single file loading and pretrained loading"
+    def test_single_file_arguments(self):
+        model_default = self.model_class.from_single_file(self.ckpt_path)
+        assert model_default.config.upcast_attention is False
+        assert model_default.dtype == torch.float32
+        torch_dtype = torch.float16
+        upcast_attention = True
+        model = self.model_class.from_single_file(
+            self.ckpt_path,
+            upcast_attention=upcast_attention,
+            torch_dtype=torch_dtype,
+        )
+        assert model.config.upcast_attention == upcast_attention
+        assert model.dtype == torch_dtype
--- a/tests/single_file/test_model_sd_cascade_unet_single_file.py
+++ b/tests/single_file/test_model_sd_cascade_unet_single_file.py
+# coding=utf-8
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import gc
+import unittest
+import torch
+from diffusers import StableCascadeUNet
+from diffusers.utils import logging
+from diffusers.utils.testing_utils import (
+    enable_full_determinism,
+    require_torch_gpu,
+    slow,
+)
+logger = logging.get_logger(__name__)
+enable_full_determinism()
+@slow
+@require_torch_gpu
+class StableCascadeUNetSingleFileTest(unittest.TestCase):
+    def setUp(self):
+        super().setUp()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def tearDown(self):
+        super().tearDown()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def test_single_file_components_stage_b(self):
+        model_single_file = StableCascadeUNet.from_single_file(
+            "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b_bf16.safetensors",
+            torch_dtype=torch.bfloat16,
+        )
+        model = StableCascadeUNet.from_pretrained(
+            "stabilityai/stable-cascade", variant="bf16", subfolder="decoder", use_safetensors=True
+        )
+        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "_use_default_values", "_diffusers_version"]
+        for param_name, param_value in model_single_file.config.items():
+            if param_name in PARAMS_TO_IGNORE:
+                continue
+            assert (
+                model.config[param_name] == param_value
+            ), f"{param_name} differs between single file loading and pretrained loading"
+    def test_single_file_components_stage_b_lite(self):
+        model_single_file = StableCascadeUNet.from_single_file(
+            "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b_lite_bf16.safetensors",
+            torch_dtype=torch.bfloat16,
+        )
+        model = StableCascadeUNet.from_pretrained(
+            "stabilityai/stable-cascade", variant="bf16", subfolder="decoder_lite"
+        )
+        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "_use_default_values", "_diffusers_version"]
+        for param_name, param_value in model_single_file.config.items():
+            if param_name in PARAMS_TO_IGNORE:
+                continue
+            assert (
+                model.config[param_name] == param_value
+            ), f"{param_name} differs between single file loading and pretrained loading"
+    def test_single_file_components_stage_c(self):
+        model_single_file = StableCascadeUNet.from_single_file(
+            "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_c_bf16.safetensors",
+            torch_dtype=torch.bfloat16,
+        )
+        model = StableCascadeUNet.from_pretrained(
+            "stabilityai/stable-cascade-prior", variant="bf16", subfolder="prior"
+        )
+        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "_use_default_values", "_diffusers_version"]
+        for param_name, param_value in model_single_file.config.items():
+            if param_name in PARAMS_TO_IGNORE:
+                continue
+            assert (
+                model.config[param_name] == param_value
+            ), f"{param_name} differs between single file loading and pretrained loading"
+    def test_single_file_components_stage_c_lite(self):
+        model_single_file = StableCascadeUNet.from_single_file(
+            "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_c_lite_bf16.safetensors",
+            torch_dtype=torch.bfloat16,
+        )
+        model = StableCascadeUNet.from_pretrained(
+            "stabilityai/stable-cascade-prior", variant="bf16", subfolder="prior_lite"
+        )
+        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "_use_default_values", "_diffusers_version"]
+        for param_name, param_value in model_single_file.config.items():
+            if param_name in PARAMS_TO_IGNORE:
+                continue
+            assert (
+                model.config[param_name] == param_value
+            ), f"{param_name} differs between single file loading and pretrained loading"
--- a/tests/single_file/test_model_vae_single_file.py
+++ b/tests/single_file/test_model_vae_single_file.py
+# coding=utf-8
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import gc
+import unittest
+import torch
+from diffusers import (
+    AutoencoderKL,
+)
+from diffusers.utils.testing_utils import (
+    enable_full_determinism,
+    load_hf_numpy,
+    numpy_cosine_similarity_distance,
+    require_torch_gpu,
+    slow,
+    torch_device,
+)
+enable_full_determinism()
+@slow
+@require_torch_gpu
+class AutoencoderKLSingleFileTests(unittest.TestCase):
+    model_class = AutoencoderKL
+    ckpt_path = (
+        "https://huggingface.co/stabilityai/sd-vae-ft-mse-original/blob/main/vae-ft-mse-840000-ema-pruned.safetensors"
+    )
+    repo_id = "stabilityai/sd-vae-ft-mse"
+    main_input_name = "sample"
+    base_precision = 1e-2
+    def setUp(self):
+        super().setUp()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def tearDown(self):
+        super().tearDown()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def get_file_format(self, seed, shape):
+        return f"gaussian_noise_s={seed}_shape={'_'.join([str(s) for s in shape])}.npy"
+    def get_sd_image(self, seed=0, shape=(4, 3, 512, 512), fp16=False):
+        dtype = torch.float16 if fp16 else torch.float32
+        image = torch.from_numpy(load_hf_numpy(self.get_file_format(seed, shape))).to(torch_device).to(dtype)
+        return image
+    def test_single_file_inference_same_as_pretrained(self):
+        model_1 = self.model_class.from_pretrained(self.repo_id).to(torch_device)
+        model_2 = self.model_class.from_single_file(self.ckpt_path, config=self.repo_id).to(torch_device)
+        image = self.get_sd_image(33)
+        generator = torch.Generator(torch_device)
+        with torch.no_grad():
+            sample_1 = model_1(image, generator=generator.manual_seed(0)).sample
+            sample_2 = model_2(image, generator=generator.manual_seed(0)).sample
+        assert sample_1.shape == sample_2.shape
+        output_slice_1 = sample_1.flatten().float().cpu()
+        output_slice_2 = sample_2.flatten().float().cpu()
+        assert numpy_cosine_similarity_distance(output_slice_1, output_slice_2) < 1e-4
+    def test_single_file_components(self):
+        model = self.model_class.from_pretrained(self.repo_id)
+        model_single_file = self.model_class.from_single_file(self.ckpt_path, config=self.repo_id)
+        PARAMS_TO_IGNORE = ["torch_dtype", "_name_or_path", "_use_default_values", "_diffusers_version"]
+        for param_name, param_value in model_single_file.config.items():
+            if param_name in PARAMS_TO_IGNORE:
+                continue
+            assert (
+                model.config[param_name] == param_value
+            ), f"{param_name} differs between pretrained loading and single file loading"
+    def test_single_file_arguments(self):
+        model_default = self.model_class.from_single_file(self.ckpt_path, config=self.repo_id)
+        assert model_default.config.scaling_factor == 0.18215
+        assert model_default.config.sample_size == 256
+        assert model_default.dtype == torch.float32
+        scaling_factor = 2.0
+        sample_size = 512
+        torch_dtype = torch.float16
+        model = self.model_class.from_single_file(
+            self.ckpt_path,
+            config=self.repo_id,
+            sample_size=sample_size,
+            scaling_factor=scaling_factor,
+            torch_dtype=torch_dtype,
+        )
+        assert model.config.scaling_factor == scaling_factor
+        assert model.config.sample_size == sample_size
+        assert model.dtype == torch_dtype
--- a/tests/single_file/test_stable_diffusion_controlnet_img2img_single_file.py
+++ b/tests/single_file/test_stable_diffusion_controlnet_img2img_single_file.py
+import gc
+import tempfile
+import unittest
+import torch
+from diffusers import ControlNetModel, StableDiffusionControlNetPipeline
+from diffusers.utils import load_image
+from diffusers.utils.testing_utils import (
+    enable_full_determinism,
+    numpy_cosine_similarity_distance,
+    require_torch_gpu,
+    slow,
+    torch_device,
+)
+from .single_file_testing_utils import (
+    SDSingleFileTesterMixin,
+    download_diffusers_config,
+    download_original_config,
+    download_single_file_checkpoint,
+)
+enable_full_determinism()
+@slow
+@require_torch_gpu
+class StableDiffusionControlNetPipelineSingleFileSlowTests(unittest.TestCase, SDSingleFileTesterMixin):
+    pipeline_class = StableDiffusionControlNetPipeline
+    ckpt_path = "https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors"
+    original_config = (
+        "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/configs/stable-diffusion/v1-inference.yaml"
+    )
+    repo_id = "runwayml/stable-diffusion-v1-5"
+    def setUp(self):
+        super().setUp()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def tearDown(self):
+        super().tearDown()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def get_inputs(self, device, generator_device="cpu", dtype=torch.float32, seed=0):
+        generator = torch.Generator(device=generator_device).manual_seed(seed)
+        init_image = load_image(
+            "https://huggingface.co/datasets/diffusers/test-arrays/resolve/main"
+            "/stable_diffusion_img2img/sketch-mountains-input.png"
+        )
+        control_image = load_image(
+            "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/bird_canny.png"
+        ).resize((512, 512))
+        prompt = "bird"
+        inputs = {
+            "prompt": prompt,
+            "image": init_image,
+            "control_image": control_image,
+            "generator": generator,
+            "num_inference_steps": 3,
+            "strength": 0.75,
+            "guidance_scale": 7.5,
+            "output_type": "np",
+        }
+        return inputs
+    def test_single_file_format_inference_is_same_as_pretrained(self):
+        controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny")
+        pipe = self.pipeline_class.from_pretrained(self.repo_id, controlnet=controlnet)
+        pipe.unet.set_default_attn_processor()
+        pipe.enable_model_cpu_offload()
+        pipe_sf = self.pipeline_class.from_single_file(
+            self.ckpt_path,
+            controlnet=controlnet,
+        )
+        pipe_sf.unet.set_default_attn_processor()
+        pipe_sf.enable_model_cpu_offload()
+        inputs = self.get_inputs(torch_device)
+        output = pipe(**inputs).images[0]
+        inputs = self.get_inputs(torch_device)
+        output_sf = pipe_sf(**inputs).images[0]
+        max_diff = numpy_cosine_similarity_distance(output_sf.flatten(), output.flatten())
+        assert max_diff < 1e-3
+    def test_single_file_components(self):
+        controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny")
+        pipe = self.pipeline_class.from_pretrained(
+            self.repo_id, variant="fp16", safety_checker=None, controlnet=controlnet
+        )
+        pipe_single_file = self.pipeline_class.from_single_file(
+            self.ckpt_path,
+            safety_checker=None,
+            controlnet=controlnet,
+        )
+        super()._compare_component_configs(pipe, pipe_single_file)
+    def test_single_file_components_local_files_only(self):
+        controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny")
+        pipe = self.pipeline_class.from_pretrained(self.repo_id, controlnet=controlnet)
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            pipe_single_file = self.pipeline_class.from_single_file(
+                local_ckpt_path, controlnet=controlnet, safety_checker=None, local_files_only=True
+            )
+        super()._compare_component_configs(pipe, pipe_single_file)
+    def test_single_file_components_with_original_config(self):
+        controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny", variant="fp16")
+        pipe = self.pipeline_class.from_pretrained(self.repo_id, controlnet=controlnet)
+        pipe_single_file = self.pipeline_class.from_single_file(
+            self.ckpt_path, controlnet=controlnet, safety_checker=None, original_config=self.original_config
+        )
+        super()._compare_component_configs(pipe, pipe_single_file)
+    def test_single_file_components_with_original_config_local_files_only(self):
+        controlnet = ControlNetModel.from_pretrained(
+            "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16, variant="fp16"
+        )
+        pipe = self.pipeline_class.from_pretrained(
+            self.repo_id,
+            controlnet=controlnet,
+        )
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            local_original_config = download_original_config(self.original_config, tmpdir)
+            pipe_single_file = self.pipeline_class.from_single_file(
+                local_ckpt_path,
+                original_config=local_original_config,
+                controlnet=controlnet,
+                safety_checker=None,
+                local_files_only=True,
+            )
+        super()._compare_component_configs(pipe, pipe_single_file)
+    def test_single_file_components_with_diffusers_config(self):
+        controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny", variant="fp16")
+        pipe = self.pipeline_class.from_pretrained(self.repo_id, controlnet=controlnet)
+        pipe_single_file = self.pipeline_class.from_single_file(
+            self.ckpt_path, controlnet=controlnet, safety_checker=None, original_config=self.original_config
+        )
+        super()._compare_component_configs(pipe, pipe_single_file)
+    def test_single_file_components_with_diffusers_config_local_files_only(self):
+        controlnet = ControlNetModel.from_pretrained(
+            "lllyasviel/control_v11p_sd15_canny", torch_dtype=torch.float16, variant="fp16"
+        )
+        pipe = self.pipeline_class.from_pretrained(
+            self.repo_id,
+            controlnet=controlnet,
+        )
+        with tempfile.TemporaryDirectory() as tmpdir:
+            ckpt_filename = self.ckpt_path.split("/")[-1]
+            local_ckpt_path = download_single_file_checkpoint(self.repo_id, ckpt_filename, tmpdir)
+            local_diffusers_config = download_diffusers_config(self.repo_id, tmpdir)
+            pipe_single_file = self.pipeline_class.from_single_file(
+                local_ckpt_path,
+                config=local_diffusers_config,
+                safety_checker=None,
+                controlnet=controlnet,
+                local_files_only=True,
+            )
+        super()._compare_component_configs(pipe, pipe_single_file)
--- a/tests/single_file/test_stable_diffusion_controlnet_inpaint_single_file.py
+++ b/tests/single_file/test_stable_diffusion_controlnet_inpaint_single_file.py
--- a/tests/single_file/test_stable_diffusion_controlnet_single_file.py
+++ b/tests/single_file/test_stable_diffusion_controlnet_single_file.py
--- a/tests/single_file/test_stable_diffusion_img2img_single_file.py
+++ b/tests/single_file/test_stable_diffusion_img2img_single_file.py
--- a/tests/single_file/test_stable_diffusion_inpaint_single_file.py
+++ b/tests/single_file/test_stable_diffusion_inpaint_single_file.py
--- a/tests/single_file/test_stable_diffusion_single_file.py
+++ b/tests/single_file/test_stable_diffusion_single_file.py
--- a/tests/single_file/test_stable_diffusion_upscale_single_file.py
+++ b/tests/single_file/test_stable_diffusion_upscale_single_file.py
+import gc
+import unittest
+import torch
+from diffusers import (
+    StableDiffusionUpscalePipeline,
+)
+from diffusers.utils import load_image
+from diffusers.utils.testing_utils import (
+    enable_full_determinism,
+    numpy_cosine_similarity_distance,
+    require_torch_gpu,
+    slow,
+)
+from .single_file_testing_utils import SDSingleFileTesterMixin
+enable_full_determinism()
+@slow
+@require_torch_gpu
+class StableDiffusionUpscalePipelineSingleFileSlowTests(unittest.TestCase, SDSingleFileTesterMixin):
+    pipeline_class = StableDiffusionUpscalePipeline
+    ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-x4-upscaler/blob/main/x4-upscaler-ema.safetensors"
+    original_config = "https://raw.githubusercontent.com/Stability-AI/stablediffusion/main/configs/stable-diffusion/x4-upscaling.yaml"
+    repo_id = "stabilityai/stable-diffusion-x4-upscaler"
+    def setUp(self):
+        super().setUp()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def tearDown(self):
+        super().tearDown()
+        gc.collect()
+        torch.cuda.empty_cache()
+    def test_single_file_format_inference_is_same_as_pretrained(self):
+        image = load_image(
+            "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main"
+            "/sd2-upscale/low_res_cat.png"
+        )
+        prompt = "a cat sitting on a park bench"
+        pipe = StableDiffusionUpscalePipeline.from_pretrained(self.repo_id)
+        pipe.enable_model_cpu_offload()
+        generator = torch.Generator("cpu").manual_seed(0)
+        output = pipe(prompt=prompt, image=image, generator=generator, output_type="np", num_inference_steps=3)
+        image_from_pretrained = output.images[0]
+        pipe_from_single_file = StableDiffusionUpscalePipeline.from_single_file(self.ckpt_path)
+        pipe_from_single_file.enable_model_cpu_offload()
+        generator = torch.Generator("cpu").manual_seed(0)
+        output_from_single_file = pipe_from_single_file(
+            prompt=prompt, image=image, generator=generator, output_type="np", num_inference_steps=3
+        )
+        image_from_single_file = output_from_single_file.images[0]
+        assert image_from_pretrained.shape == (512, 512, 3)
+        assert image_from_single_file.shape == (512, 512, 3)
+        assert (
+            numpy_cosine_similarity_distance(image_from_pretrained.flatten(), image_from_single_file.flatten()) < 1e-3
+        )