"magic_pdf/para/block_continuation_processor.py.bak" did not exist on "c2e5c36f796ab8a4418f2676264cbb995713f451"
amused.md 3.01 KB
Newer Older
Aryan's avatar
Aryan committed
1
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Will Berman's avatar
Will Berman committed
2
3
4
5
6
7
8
9
10
11
12

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

13
14
15
> [!WARNING]
> This pipeline is deprecated but it can still be used. However, we won't test the pipeline anymore and won't accept any changes to it. If you run into any issues, reinstall the last Diffusers version that supported this model.

Will Berman's avatar
Will Berman committed
16
17
# aMUSEd

Liang Hou's avatar
Liang Hou committed
18
aMUSEd was introduced in [aMUSEd: An Open MUSE Reproduction](https://huggingface.co/papers/2401.01808) by Suraj Patil, William Berman, Robin Rombach, and Patrick von Platen.
Sayak Paul's avatar
Sayak Paul committed
19

Quentin Gallouédec's avatar
Quentin Gallouédec committed
20
Amused is a lightweight text to image model based off of the [MUSE](https://huggingface.co/papers/2301.00704) architecture. Amused is particularly useful in applications that require a lightweight and fast model such as generating many images quickly at once.
Will Berman's avatar
Will Berman committed
21

Tolga Cangöz's avatar
Tolga Cangöz committed
22
Amused is a vqvae token based transformer that can generate an image in fewer forward passes than many diffusion models. In contrast with muse, it uses the smaller text encoder CLIP-L/14 instead of t5-xxl. Due to its small parameter count and few forward pass generation process, amused can generate many images quickly. This benefit is seen particularly at larger batch sizes.
Will Berman's avatar
Will Berman committed
23

Sayak Paul's avatar
Sayak Paul committed
24
25
26
27
The abstract from the paper is:

*We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE's parameters, aMUSEd is focused on fast image generation. We believe MIM is under-explored compared to latent diffusion, the prevailing approach for text-to-image generation. Compared to latent diffusion, MIM requires fewer inference steps and is more interpretable. Additionally, MIM can be fine-tuned to learn additional styles with only a single image. We hope to encourage further exploration of MIM by demonstrating its effectiveness on large-scale text-to-image generation and releasing reproducible training code. We also release checkpoints for two models which directly produce images at 256x256 and 512x512 resolutions.*

Will Berman's avatar
Will Berman committed
28
29
| Model | Params |
|-------|--------|
30
31
| [amused-256](https://huggingface.co/amused/amused-256) | 603M |
| [amused-512](https://huggingface.co/amused/amused-512) | 608M |
Will Berman's avatar
Will Berman committed
32
33
34
35

## AmusedPipeline

[[autodoc]] AmusedPipeline
36
37
38
39
40
41
42
43
44
45
46
47
	- __call__
	- all
	- enable_xformers_memory_efficient_attention
	- disable_xformers_memory_efficient_attention

[[autodoc]] AmusedImg2ImgPipeline
	- __call__
	- all
	- enable_xformers_memory_efficient_attention
	- disable_xformers_memory_efficient_attention

[[autodoc]] AmusedInpaintPipeline
Will Berman's avatar
Will Berman committed
48
49
50
51
	- __call__
	- all
	- enable_xformers_memory_efficient_attention
	- disable_xformers_memory_efficient_attention