VISUALIZATION_GUIDE.md 15 KB
Newer Older
weishb's avatar
weishb committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
# Sparse Structure Visualization Guide

This guide explains how to use sparse structure visualization features added to TRELLIS.2.

## Overview

The sparse structure is a 3D voxel grid that represents which parts of the 3D space are occupied by the object being generated. Visualizing this helps you understand:

- The initial "skeleton" or blueprint of the 3D object
- How different pipeline types (512, 1024_cascade, 1536_cascade) affect the sparse structure
- The distribution and density of occupied voxels
- The upsampling process in cascade modes (from LR to HR coordinates)
- Potential issues in the generation process

## Two Stages of Visualization

### Stage 1: Initial Sparse Structure
Generated by [`sample_sparse_structure()`](trellis2/pipelines/trellis2_image_to_3d.py:189) - this is the initial coarse voxel grid.

### Stage 2: High-Resolution Coordinates (Cascade Modes Only)
Generated by [`sample_shape_slat_cascade()`](trellis2/pipelines/trellis2_image_to_3d.py:280) - these are the upsampled coordinates after the decoder upsamples the sparse latent 4x.

**Note:** HR coordinates visualization is only available for cascade pipeline types (`1024_cascade` and `1536_cascade`).

### Stage 3: Quantized Coordinates (Cascade Modes Only)
Generated after the resolution adjustment loop in [`sample_shape_slat_cascade()`](trellis2/pipelines/trellis2_image_to_3d.py:412) - these are the coordinates after quantization, deduplication, and adaptive resolution adjustment.

**What this shows:**
- The final coordinate grid used for shape generation
- How many tokens after adaptive resolution reduction
- The actual spatial resolution being used (may be less than target)

### Stage 4: Final SLat Features (Cascade Modes Only)
Generated after flow model sampling and denormalization in [`sample_shape_slat_cascade()`](trellis2/pipelines/trellis2_image_to_3d.py:450) - these are the learned features at each coordinate.

**What this shows:**
- The actual learned shape features
- Feature value distributions across the object
- Quality of the generated shape representation

**Note:** SLat features visualization is only available for cascade pipeline types.

### Stage 5: Texture Features (Cascade Modes Only)
Generated during texture sampling in [`sample_tex_slat()`](trellis2/pipelines/trellis2_image_to_3d.py:567) - these are the learned texture attributes at each coordinate.

**What this shows:**
- Learned texture features (e.g., RGB colors, roughness, metallic properties)
- How texture varies across spatial locations
- Feature value distributions for each texture channel

**Note:** Texture features typically have multiple dimensions (e.g., 3 for RGB textures).

## Understanding the Visualizations

### What You're Seeing

The sparse structure coordinates have shape `[N, 4]` where:
- **Column 0**: Batch index (always 0 for single samples)
- **Column 1**: X coordinate (0 to resolution-1)
- **Column 2**: Y coordinate (0 to resolution-1)
- **Column 3**: Z coordinate (0 to resolution-1)

### Initial Sparse Structure vs. HR Coordinates

When using cascade modes, you'll see two sets of visualizations:

1. **Initial Sparse Structure** (e.g., `sparse_structure_1024_cascade_seed42_*.png`)
   - Coarse 32³ voxel grid
   - ~5,000 - 15,000 occupied voxels
   - Generated directly from the sparse structure flow model

2. **HR Coordinates** (e.g., `hr_coords_1024_upsampled_*.png`)
   - Upsampled coordinates (4x denser)
   - ~20,000 - 60,000 coordinates
   - Generated by the decoder upsampling the shape SLat
   - Shows the refined structure before final shape generation

**Key Insight:** Comparing these two visualizations shows how the upsampling process refines the initial sparse structure into a more detailed representation.

## Available Visualization Methods

### 1. Matplotlib 3D Scatter Plot (`visualize_sparse_structure_matplotlib`)

Shows the sparse structure as a 3D scatter plot with color-coded Z coordinates.

**Best for:** Understanding the overall 3D shape and spatial distribution.

**Output:** Interactive 3D plot (or saved PNG file)

### 2. Voxel Grid Visualization (`visualize_sparse_structure_voxel`)

Displays the sparse structure as a 3D voxel grid where each occupied voxel is shown as a point.

**Best for:** Seeing the actual voxel structure and understanding resolution effects.

**Output:** 3D voxel visualization (or saved PNG file)

### 3. 2D Projections (`visualize_sparse_structure_projections`)

Shows three orthogonal 2D projections:
- **XY Projection**: Top view (looking down Z axis)
- **XZ Projection**: Side view (looking down Y axis)
- **YZ Projection**: Front view (looking down X axis)

**Best for:** Quick analysis of shape from different angles.

**Output:** Three 2D scatter plots in one figure (or saved PNG file)

### 4. Multi-View Visualization (`visualize_sparse_structure_multi_view`)

Combines 3D scatter plot with 2D projections in a single figure.

**Best for:** Comprehensive overview of sparse structure.

**Output:** Combined 3D + 2D visualization (or saved PNG file)

### 5. Statistical Analysis (`analyze_sparse_structure`)

Prints numerical statistics about the sparse structure:
- Total number of occupied voxels
- Coordinate ranges (X, Y, Z)
- Center position
- Standard deviation
- Bounding box volume

**Best for:** Quick quantitative analysis without visualization.

**Output:** Console output with statistics

### 6. SLat Features Visualization (`visualize_slat_features`)

Shows learned features in the shape Structured Latent (SLat) as a 3D scatter plot.

**Best for:** Understanding what the model has learned at each spatial location.

**Output:** 3D plot colored by feature values (or saved PNG file)

**Parameters:**
- `feature_idx`: Which feature dimension to visualize (default: 0)
- Multiple features can be visualized by calling with different indices

**Note:** Texture features have multiple dimensions (e.g., 3 for RGB), each representing learned texture attributes at each coordinate.

### 7. Texture Features Analysis (`analyze_slat_features`)

Prints numerical statistics about SLat features:
- Number of tokens (coordinates)
- Feature dimensions
- Statistics for each feature (min, max, mean, std)
- NaN/Inf value checks
- Coordinate ranges

**Best for:** Debugging feature values and checking for anomalies.

**Output:** Console output with feature statistics

## Quick Start with example_visualization.py

The repo includes [example_visualization.py](example_visualization.py), a standalone script that runs the full pipeline, saves stage visualizations, renders multiple views, and exports a raw `.obj` file — all in one shot. It is the fastest way to verify the pipeline is working correctly on your hardware.

### What it does

1. Runs the pipeline on a test image with all visualization stages enabled
2. Exports a raw `.obj` file (no renderer, no nvdiffrast) — load this in Blender to verify geometry completeness independently of the renderer
3. Renders N views using the same `render_snapshot` path as `app.py` and saves contact sheets

### Configuration

Edit the constants at the top of the file:

```python
IMAGE_PATH   = "assets/example_image/T2.png"   # input image
PIPELINE     = "1024_cascade"                   # '512' | '1024' | '1024_cascade' | '1536_cascade'
SEED         = 42
NVIEWS       = 8                                # render views
RENDER_RES   = 1024                             # render resolution
VIZ_DIR      = "visualizations_render_test"     # output directory
```

### Running

```sh
export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"   # AMD only
python example_visualization.py
```

Output files in `VIZ_DIR/`:
- `raw_mesh.obj` — raw geometry, no renderer involved. If this looks correct in Blender but renders look wrong, the bug is in the rasterizer path, not the pipeline.
- `render_frames/contact_shaded_all_views.png` — all rendered views side by side
- `render_frames/contact_normal_all_views.png` — surface normals
- `render_frames/contact_base_color_all_views.png` — albedo without lighting
- Per-stage visualization PNGs (sparse structure, HR coords, SLat features, etc.)

### Diagnosing issues

The `.obj` export is intentionally renderer-free. If the `.obj` geometry is complete but render images show only 15–30% coverage, the issue is in the nvdiffrast/rasterizer path. If the `.obj` itself looks wrong, the issue is earlier in the pipeline.


## Usage

### Basic Usage in Your Code

```python
from trellis2.pipelines import Trellis2ImageTo3DPipeline
from PIL import Image

# Load pipeline
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# Load image
image = Image.open("path/to/image.png")

# Run with visualization
mesh = pipeline.run(
    image,
    seed=42,
    pipeline_type='1024_cascade',
    visualize_sparse_structure=True,  # Enable visualization
    visualize_save_dir=None,          # None = interactive display
)
```

### Saving Visualizations to Disk

```python
mesh = pipeline.run(
    image,
    seed=42,
    pipeline_type='1024_cascade',
    visualize_sparse_structure=True,
    visualize_save_dir='my_visualizations',  # Save to directory
)
```

This will create multiple files for each visualization stage.

### Statistical Analysis Only

```python
# Generate sparse structure
coords = pipeline.sample_sparse_structure(
    pipeline.get_cond([image], 512),
    resolution=32,
    num_samples=1,
    sampler_params={}
)

# Analyze without visualization
pipeline.analyze_sparse_structure(coords)
```

Output:
```
Sparse Structure Analysis:
  Total occupied voxels: 15234
  X range: [2, 29]
  Y range: [5, 26]
  Z range: [3, 28]
  Center: [15.2, 15.8, 14.9]
  Std dev: [6.3, 5.9, 7.1]
  Bounding box volume: 5832
```

## Parameters

### `visualize_sparse_structure` (bool)
- **Default:** `False`
- **Description:** Enable or disable sparse structure visualization
- **Usage:** Set to `True` to visualize the sparse structure after generation

### `visualize_save_dir` (str or None)
- **Default:** `None`
- **Description:** Directory path to save visualization images
- **Usage:** 
  - `None`: Display visualizations interactively (blocks execution)
  - `"/path/to/dir"`: Save visualizations to disk (non-blocking)

## Understanding the Visualizations

### What You're Seeing

The sparse structure coordinates have shape `[N, 4]` where:
- **Column 0**: Batch index (always 0 for single samples)
- **Column 1**: X coordinate (0 to resolution-1)
- **Column 2**: Y coordinate (0 to resolution-1)
- **Column 3**: Z coordinate (0 to resolution-1)

### Resolution Differences

Different pipeline types use different sparse structure resolutions:

| Pipeline Type | Sparse Structure Resolution | Grid Size | Typical Voxel Count |
|--------------|----------------------------|-----------|---------------------|
| 512 | 32 | 32³ = 32,768 | ~5,000 - 15,000 |
| 1024 | 64 | 64³ = 262,144 | ~20,000 - 50,000 |
| 1024_cascade | 32 | 32³ = 32,768 | ~5,000 - 15,000 |
| 1536_cascade | 32 | 32³ = 32,768 | ~5,000 - 15,000 |

**Note:** Cascade modes use the same sparse structure resolution as 512, but later upsample during shape generation.

### Color Coding

- **Z-coordinate coloring**: Points are colored by their Z position (using viridis colormap)
- **Higher Z values**: Yellow/green (top of object)
- **Lower Z values**: Purple/blue (bottom of object)

## Examples

### Example 1: Compare Different Pipeline Types

```python
import os

for pipeline_type in ['512', '1024_cascade', '1536_cascade']:
    print(f"Generating with {pipeline_type}...")
    mesh = pipeline.run(
        image,
        seed=42,
        pipeline_type=pipeline_type,
        visualize_sparse_structure=True,
        visualize_save_dir=f'comparison/{pipeline_type}',
    )
```

### Example 2: Debug Generation Issues

```python
# Generate with visualization to check sparse structure
mesh = pipeline.run(
    image,
    seed=42,
    pipeline_type='1024_cascade',
    visualize_sparse_structure=True,
    visualize_save_dir='debug_output',
)

# If sparse structure looks abnormal, you can:
# 1. Check if voxel count is too high/low
# 2. Verify coordinate ranges are within expected bounds
# 3. Compare with known good examples
```

### Example 3: Batch Analysis

```python
import pandas as pd

results = []

for seed in range(10):
    coords = pipeline.sample_sparse_structure(
        pipeline.get_cond([image], 512),
        resolution=32,
        num_samples=1,
        sampler_params={}
    )
    
    coords_np = coords.cpu().numpy()
    results.append({
        'seed': seed,
        'num_voxels': len(coords),
        'x_range': coords_np[:, 1].max() - coords_np[:, 1].min(),
  # ... more fields
    })

df = pd.DataFrame(results)
print(df.describe())
```

### Example 4: Complete Cascade Visualization

```python
# Visualize complete cascade process with all stages
mesh = pipeline.run(
    image,
    seed=42,
    pipeline_type='1024_cascade',
    visualize_sparse_structure=True,
    visualize_save_dir='complete_cascade',
)

# This creates visualizations for:
# 1. Initial sparse structure
# 2. HR coordinates (upsampled)
# 3. Quantized coordinates
# 4. Final SLat features
# 5. Texture features
```

## Troubleshooting

### Issue: Plots don't display

**Solution:** Make sure you're running in an environment with display support (not headless). For headless environments, use `visualize_save_dir` to save files instead.

### Issue: Too many voxels, visualization is slow

**Solution:** The visualization can be slow for very large sparse structures (>50,000 voxels). Consider:
1. Using lower resolution pipeline types
2. Saving to disk instead of interactive display
3. Using statistical analysis instead of full visualization

### Issue: Out of memory during visualization

**Solution:** Matplotlib can use significant memory for large plots. Try:
1. Saving to disk instead of interactive display
2. Using only 2D projections method
3. Using statistical analysis only

## Advanced Usage

### Custom Visualization

You can also call individual visualization methods directly:

```python
# Get sparse structure
coords = pipeline.sample_sparse_structure(
    pipeline.get_cond([image], 512),
    resolution=32,
    num_samples=1,
    sampler_params={}
)

# Use specific visualization method
pipeline.visualize_sparse_structure_projections(
    coords,
    resolution=32,
    title="My Custom Title",
    save_path="custom_output.png"
)
```

### Integration with Existing Code

```python
# In your existing pipeline code
def my_generation_function(image, seed):
    # Generate sparse structure
    coords = pipeline.sample_sparse_structure(
        pipeline.get_cond([image], 512),
        resolution=32,
        num_samples=1,
        sampler_params={}
    )
    
    # Analyze
    pipeline.analyze_sparse_structure(coords)
    
    # Continue with generation
    shape_slat = pipeline.sample_shape_slat(
        pipeline.get_cond([image], 512),
        pipeline.models['shape_slat_flow_model_512'],
        coords,
        {}
    )
    
    # ... rest of your code
```

## References

- Main pipeline code: `trellis2/pipelines/trellis2_image_to_3d.py`
- Example script: `example_visualization.py`
- Sparse structure sampling: `sample_sparse_structure()` method (line 189-236)
- Visualization methods: Lines 472-690 in `trellis2_image_to_3d.py`