## 🔥SCEdit SCEdit, proposed by Alibaba TongYi Vision Intelligence Lab, is an efficient generative fine-tuning framework. The framework not only supports fine-tuning capabilities for text-to-image downstream tasks, **saving 30%-50% of training memory overhead compared to LoRA**, achieving rapid transfer to specific generation scenarios; but it can also **directly extend to controllable image generation tasks, requiring only 7.9% of the parameter amount of ControlNet conditional generation and saving 30% of memory overhead**, supporting conditional generation tasks such as edge images, depth images, segmentation images, poses, color images, image inpainting, etc. We used the 3D style data from the [Style Transfer Dataset](https://modelscope.cn/datasets/damo/style_custom_dataset/dataPeview) for training, and tested using the same `Prompt: A boy in a camouflage jacket with a scarf`. The specific qualitative and quantitative results are as follows: | Method | bs | ep | Target Module | Param. (M) | Mem. (MiB) | 3D style | | --------- | ---- | ---- | ------------- | ------------- | ---------- | ------------------------------------------------------------ | | LoRA/r=64 | 1 | 50 | q/k/v/out/mlp | 23.94 (2.20%) | 8440MiB |

| | SCEdit | 1 | 50 | up_blocks | 19.68 (1.81%) | 7556MiB |

| | LoRA/r=64 | 10 | 100 | q/k/v/out/mlp | 23.94 (2.20%) | 26300MiB |

| | SCEdit | 10 | 100 | up_blocks | 19.68 (1.81%) | 18634MiB |

| | LoRA/r=64 | 30 | 200 | q/k/v/out/mlp | 23.94 (2.20%) | 69554MiB |

| | SCEdit | 30 | 200 | up_blocks | 19.68 (1.81%) | 43350MiB |

| To perform the training task using SCEdit and reproduce the above results: ```shell # First, follow the installation steps in the section below cd examples/pytorch/multi_modal/notebook python text_to_image_synthesis.py ```