# Sparse Structure Visualization Guide This guide explains how to use sparse structure visualization features added to TRELLIS.2. ## Overview The sparse structure is a 3D voxel grid that represents which parts of the 3D space are occupied by the object being generated. Visualizing this helps you understand: - The initial "skeleton" or blueprint of the 3D object - How different pipeline types (512, 1024_cascade, 1536_cascade) affect the sparse structure - The distribution and density of occupied voxels - The upsampling process in cascade modes (from LR to HR coordinates) - Potential issues in the generation process ## Two Stages of Visualization ### Stage 1: Initial Sparse Structure Generated by [`sample_sparse_structure()`](trellis2/pipelines/trellis2_image_to_3d.py:189) - this is the initial coarse voxel grid. ### Stage 2: High-Resolution Coordinates (Cascade Modes Only) Generated by [`sample_shape_slat_cascade()`](trellis2/pipelines/trellis2_image_to_3d.py:280) - these are the upsampled coordinates after the decoder upsamples the sparse latent 4x. **Note:** HR coordinates visualization is only available for cascade pipeline types (`1024_cascade` and `1536_cascade`). ### Stage 3: Quantized Coordinates (Cascade Modes Only) Generated after the resolution adjustment loop in [`sample_shape_slat_cascade()`](trellis2/pipelines/trellis2_image_to_3d.py:412) - these are the coordinates after quantization, deduplication, and adaptive resolution adjustment. **What this shows:** - The final coordinate grid used for shape generation - How many tokens after adaptive resolution reduction - The actual spatial resolution being used (may be less than target) ### Stage 4: Final SLat Features (Cascade Modes Only) Generated after flow model sampling and denormalization in [`sample_shape_slat_cascade()`](trellis2/pipelines/trellis2_image_to_3d.py:450) - these are the learned features at each coordinate. **What this shows:** - The actual learned shape features - Feature value distributions across the object - Quality of the generated shape representation **Note:** SLat features visualization is only available for cascade pipeline types. ### Stage 5: Texture Features (Cascade Modes Only) Generated during texture sampling in [`sample_tex_slat()`](trellis2/pipelines/trellis2_image_to_3d.py:567) - these are the learned texture attributes at each coordinate. **What this shows:** - Learned texture features (e.g., RGB colors, roughness, metallic properties) - How texture varies across spatial locations - Feature value distributions for each texture channel **Note:** Texture features typically have multiple dimensions (e.g., 3 for RGB textures). ## Understanding the Visualizations ### What You're Seeing The sparse structure coordinates have shape `[N, 4]` where: - **Column 0**: Batch index (always 0 for single samples) - **Column 1**: X coordinate (0 to resolution-1) - **Column 2**: Y coordinate (0 to resolution-1) - **Column 3**: Z coordinate (0 to resolution-1) ### Initial Sparse Structure vs. HR Coordinates When using cascade modes, you'll see two sets of visualizations: 1. **Initial Sparse Structure** (e.g., `sparse_structure_1024_cascade_seed42_*.png`) - Coarse 32³ voxel grid - ~5,000 - 15,000 occupied voxels - Generated directly from the sparse structure flow model 2. **HR Coordinates** (e.g., `hr_coords_1024_upsampled_*.png`) - Upsampled coordinates (4x denser) - ~20,000 - 60,000 coordinates - Generated by the decoder upsampling the shape SLat - Shows the refined structure before final shape generation **Key Insight:** Comparing these two visualizations shows how the upsampling process refines the initial sparse structure into a more detailed representation. ## Available Visualization Methods ### 1. Matplotlib 3D Scatter Plot (`visualize_sparse_structure_matplotlib`) Shows the sparse structure as a 3D scatter plot with color-coded Z coordinates. **Best for:** Understanding the overall 3D shape and spatial distribution. **Output:** Interactive 3D plot (or saved PNG file) ### 2. Voxel Grid Visualization (`visualize_sparse_structure_voxel`) Displays the sparse structure as a 3D voxel grid where each occupied voxel is shown as a point. **Best for:** Seeing the actual voxel structure and understanding resolution effects. **Output:** 3D voxel visualization (or saved PNG file) ### 3. 2D Projections (`visualize_sparse_structure_projections`) Shows three orthogonal 2D projections: - **XY Projection**: Top view (looking down Z axis) - **XZ Projection**: Side view (looking down Y axis) - **YZ Projection**: Front view (looking down X axis) **Best for:** Quick analysis of shape from different angles. **Output:** Three 2D scatter plots in one figure (or saved PNG file) ### 4. Multi-View Visualization (`visualize_sparse_structure_multi_view`) Combines 3D scatter plot with 2D projections in a single figure. **Best for:** Comprehensive overview of sparse structure. **Output:** Combined 3D + 2D visualization (or saved PNG file) ### 5. Statistical Analysis (`analyze_sparse_structure`) Prints numerical statistics about the sparse structure: - Total number of occupied voxels - Coordinate ranges (X, Y, Z) - Center position - Standard deviation - Bounding box volume **Best for:** Quick quantitative analysis without visualization. **Output:** Console output with statistics ### 6. SLat Features Visualization (`visualize_slat_features`) Shows learned features in the shape Structured Latent (SLat) as a 3D scatter plot. **Best for:** Understanding what the model has learned at each spatial location. **Output:** 3D plot colored by feature values (or saved PNG file) **Parameters:** - `feature_idx`: Which feature dimension to visualize (default: 0) - Multiple features can be visualized by calling with different indices **Note:** Texture features have multiple dimensions (e.g., 3 for RGB), each representing learned texture attributes at each coordinate. ### 7. Texture Features Analysis (`analyze_slat_features`) Prints numerical statistics about SLat features: - Number of tokens (coordinates) - Feature dimensions - Statistics for each feature (min, max, mean, std) - NaN/Inf value checks - Coordinate ranges **Best for:** Debugging feature values and checking for anomalies. **Output:** Console output with feature statistics ## Quick Start with example_visualization.py The repo includes [example_visualization.py](example_visualization.py), a standalone script that runs the full pipeline, saves stage visualizations, renders multiple views, and exports a raw `.obj` file — all in one shot. It is the fastest way to verify the pipeline is working correctly on your hardware. ### What it does 1. Runs the pipeline on a test image with all visualization stages enabled 2. Exports a raw `.obj` file (no renderer, no nvdiffrast) — load this in Blender to verify geometry completeness independently of the renderer 3. Renders N views using the same `render_snapshot` path as `app.py` and saves contact sheets ### Configuration Edit the constants at the top of the file: ```python IMAGE_PATH = "assets/example_image/T2.png" # input image PIPELINE = "1024_cascade" # '512' | '1024' | '1024_cascade' | '1536_cascade' SEED = 42 NVIEWS = 8 # render views RENDER_RES = 1024 # render resolution VIZ_DIR = "visualizations_render_test" # output directory ``` ### Running ```sh export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE" # AMD only python example_visualization.py ``` Output files in `VIZ_DIR/`: - `raw_mesh.obj` — raw geometry, no renderer involved. If this looks correct in Blender but renders look wrong, the bug is in the rasterizer path, not the pipeline. - `render_frames/contact_shaded_all_views.png` — all rendered views side by side - `render_frames/contact_normal_all_views.png` — surface normals - `render_frames/contact_base_color_all_views.png` — albedo without lighting - Per-stage visualization PNGs (sparse structure, HR coords, SLat features, etc.) ### Diagnosing issues The `.obj` export is intentionally renderer-free. If the `.obj` geometry is complete but render images show only 15–30% coverage, the issue is in the nvdiffrast/rasterizer path. If the `.obj` itself looks wrong, the issue is earlier in the pipeline. ## Usage ### Basic Usage in Your Code ```python from trellis2.pipelines import Trellis2ImageTo3DPipeline from PIL import Image # Load pipeline pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B") pipeline.cuda() # Load image image = Image.open("path/to/image.png") # Run with visualization mesh = pipeline.run( image, seed=42, pipeline_type='1024_cascade', visualize_sparse_structure=True, # Enable visualization visualize_save_dir=None, # None = interactive display ) ``` ### Saving Visualizations to Disk ```python mesh = pipeline.run( image, seed=42, pipeline_type='1024_cascade', visualize_sparse_structure=True, visualize_save_dir='my_visualizations', # Save to directory ) ``` This will create multiple files for each visualization stage. ### Statistical Analysis Only ```python # Generate sparse structure coords = pipeline.sample_sparse_structure( pipeline.get_cond([image], 512), resolution=32, num_samples=1, sampler_params={} ) # Analyze without visualization pipeline.analyze_sparse_structure(coords) ``` Output: ``` Sparse Structure Analysis: Total occupied voxels: 15234 X range: [2, 29] Y range: [5, 26] Z range: [3, 28] Center: [15.2, 15.8, 14.9] Std dev: [6.3, 5.9, 7.1] Bounding box volume: 5832 ``` ## Parameters ### `visualize_sparse_structure` (bool) - **Default:** `False` - **Description:** Enable or disable sparse structure visualization - **Usage:** Set to `True` to visualize the sparse structure after generation ### `visualize_save_dir` (str or None) - **Default:** `None` - **Description:** Directory path to save visualization images - **Usage:** - `None`: Display visualizations interactively (blocks execution) - `"/path/to/dir"`: Save visualizations to disk (non-blocking) ## Understanding the Visualizations ### What You're Seeing The sparse structure coordinates have shape `[N, 4]` where: - **Column 0**: Batch index (always 0 for single samples) - **Column 1**: X coordinate (0 to resolution-1) - **Column 2**: Y coordinate (0 to resolution-1) - **Column 3**: Z coordinate (0 to resolution-1) ### Resolution Differences Different pipeline types use different sparse structure resolutions: | Pipeline Type | Sparse Structure Resolution | Grid Size | Typical Voxel Count | |--------------|----------------------------|-----------|---------------------| | 512 | 32 | 32³ = 32,768 | ~5,000 - 15,000 | | 1024 | 64 | 64³ = 262,144 | ~20,000 - 50,000 | | 1024_cascade | 32 | 32³ = 32,768 | ~5,000 - 15,000 | | 1536_cascade | 32 | 32³ = 32,768 | ~5,000 - 15,000 | **Note:** Cascade modes use the same sparse structure resolution as 512, but later upsample during shape generation. ### Color Coding - **Z-coordinate coloring**: Points are colored by their Z position (using viridis colormap) - **Higher Z values**: Yellow/green (top of object) - **Lower Z values**: Purple/blue (bottom of object) ## Examples ### Example 1: Compare Different Pipeline Types ```python import os for pipeline_type in ['512', '1024_cascade', '1536_cascade']: print(f"Generating with {pipeline_type}...") mesh = pipeline.run( image, seed=42, pipeline_type=pipeline_type, visualize_sparse_structure=True, visualize_save_dir=f'comparison/{pipeline_type}', ) ``` ### Example 2: Debug Generation Issues ```python # Generate with visualization to check sparse structure mesh = pipeline.run( image, seed=42, pipeline_type='1024_cascade', visualize_sparse_structure=True, visualize_save_dir='debug_output', ) # If sparse structure looks abnormal, you can: # 1. Check if voxel count is too high/low # 2. Verify coordinate ranges are within expected bounds # 3. Compare with known good examples ``` ### Example 3: Batch Analysis ```python import pandas as pd results = [] for seed in range(10): coords = pipeline.sample_sparse_structure( pipeline.get_cond([image], 512), resolution=32, num_samples=1, sampler_params={} ) coords_np = coords.cpu().numpy() results.append({ 'seed': seed, 'num_voxels': len(coords), 'x_range': coords_np[:, 1].max() - coords_np[:, 1].min(), # ... more fields }) df = pd.DataFrame(results) print(df.describe()) ``` ### Example 4: Complete Cascade Visualization ```python # Visualize complete cascade process with all stages mesh = pipeline.run( image, seed=42, pipeline_type='1024_cascade', visualize_sparse_structure=True, visualize_save_dir='complete_cascade', ) # This creates visualizations for: # 1. Initial sparse structure # 2. HR coordinates (upsampled) # 3. Quantized coordinates # 4. Final SLat features # 5. Texture features ``` ## Troubleshooting ### Issue: Plots don't display **Solution:** Make sure you're running in an environment with display support (not headless). For headless environments, use `visualize_save_dir` to save files instead. ### Issue: Too many voxels, visualization is slow **Solution:** The visualization can be slow for very large sparse structures (>50,000 voxels). Consider: 1. Using lower resolution pipeline types 2. Saving to disk instead of interactive display 3. Using statistical analysis instead of full visualization ### Issue: Out of memory during visualization **Solution:** Matplotlib can use significant memory for large plots. Try: 1. Saving to disk instead of interactive display 2. Using only 2D projections method 3. Using statistical analysis only ## Advanced Usage ### Custom Visualization You can also call individual visualization methods directly: ```python # Get sparse structure coords = pipeline.sample_sparse_structure( pipeline.get_cond([image], 512), resolution=32, num_samples=1, sampler_params={} ) # Use specific visualization method pipeline.visualize_sparse_structure_projections( coords, resolution=32, title="My Custom Title", save_path="custom_output.png" ) ``` ### Integration with Existing Code ```python # In your existing pipeline code def my_generation_function(image, seed): # Generate sparse structure coords = pipeline.sample_sparse_structure( pipeline.get_cond([image], 512), resolution=32, num_samples=1, sampler_params={} ) # Analyze pipeline.analyze_sparse_structure(coords) # Continue with generation shape_slat = pipeline.sample_shape_slat( pipeline.get_cond([image], 512), pipeline.models['shape_slat_flow_model_512'], coords, {} ) # ... rest of your code ``` ## References - Main pipeline code: `trellis2/pipelines/trellis2_image_to_3d.py` - Example script: `example_visualization.py` - Sparse structure sampling: `sample_sparse_structure()` method (line 189-236) - Visualization methods: Lines 472-690 in `trellis2_image_to_3d.py`