README.md

# LLaDA2.0-Uni ComfyUI Nodes

Custom ComfyUI nodes for [LLaDA 2.0-Uni](https://huggingface.co/inclusionAI/LLaDA2.0-Uni) — a unified multimodal diffusion language model supporting **text-to-image generation**, **image understanding (VQA)**, and **image editing**.

## Installation

> ⚠️ These nodes depend on the `encoder/` and `decoder/` modules in the project root. Do **not** copy `apps/comfyui` in isolation — the full repository must be present and the relative path `apps/comfyui` must be preserved.

### Option 1: Clone + symlink (recommended)

```bash
# 1. Clone the full project
git clone https://github.com/inclusionAI/LLaDA2.0-Uni.git

# 2. Symlink into ComfyUI's custom_nodes
cd /path/to/ComfyUI/custom_nodes
ln -s /path/to/LLaDA2.0-Uni/apps/comfyui ./LLaDA2Uni
```

### Option 2: One-line installer

```bash
bash /path/to/LLaDA2.0-Uni/apps/comfyui/install.sh /path/to/ComfyUI
```

### Dependencies

```bash
pip install -r apps/comfyui/requirements.txt
pip install flash-attn --no-build-isolation  # optional, recommended
```

## Model Weights

In the Loader node, set the model path to either a HuggingFace repo ID or a local directory:

**HuggingFace (auto-download):**
```
inclusionAI/LLaDA2.0-Uni
```

**Local path:**
```
/path/to/LLaDA2.0-Uni
```

Expected directory layout:

```
LLaDA2.0-Uni/
├── config.json                       # LLM config
├── model-*.safetensors               # LLM weights
├── tokenizer.json
├── decoder/
│   ├── config.json
│   └── model.safetensors             # diffusion decoder
├── decoder-turbo/
│   ├── config.json
│   └── model.safetensors             # turbo decoder (8-step)
├── vae/
│   └── diffusion_pytorch_model.safetensors
└── image_tokenizer/
    ├── config.json
    ├── preprocessor_config.json
    ├── model.safetensors             # SigLIP-VQ weights
    └── sigvq_embedding.pt
```

## Nodes

| Node | Description |
|------|-------------|
| **LLaDA2.0_Uni Loader** | Load the model (Flash Attention / SDPA, optional CPU offload) |
| **LLaDA2.0_Uni Text-to-Image** | Generate VQ image tokens from a text prompt (supports thinking mode) |
| **LLaDA2.0_Uni Image Understanding** | Visual question answering |
| **LLaDA2.0_Uni Image Editing** | Edit an image with a text instruction |
| **LLaDA2.0_Uni Token Decoder** | Decode VQ tokens to pixels (turbo or normal mode) |
| **LLaDA2.0_Uni Unload Model** | Manually free VRAM |

## Example Workflows

### Text-to-Image
```
Loader → Text-to-Image → Token Decoder → Preview Image
```

### Image Understanding
```
Load Image + Loader → Image Understanding → Show Text
```

### Image Editing
```
Load Image + Loader → Image Editing → Token Decoder → Preview Image
```

## Parameters

### Loader
- `model_path` — HuggingFace repo ID or local directory
- `attention` — `flash_attn` (recommended) or `sdpa`
- `dtype` — `bf16` (recommended) or `fp8`
- `offload` — enable CPU offload for limited VRAM
- `device` — `cuda` or `cpu`

### Text-to-Image
- `prompt` — text description
- `width` / `height` — output resolution
- `steps` — LLM denoising steps (8–32)
- `cfg_scale` — classifier-free guidance scale
- `mode` — `standard` or `thinking`
- `seed` — random seed (`-1` = random)
- `block_length` — block size for block-wise denoising

### Token Decoder
- `decode_mode` — `decoder-turbo` (fast, 8 steps) or `normal` (50 steps)
- `decoder_steps` — number of steps when using `normal` mode
- `resolution_multiplier` — upscale factor (typically `2`)
- `unload_after` — release decoder VRAM after decoding (set `False` to keep cached for faster repeated decodes)

## License

Same as the parent project. See the repository root for details.