• Anton Obukhov's avatar
    [Pipeline] Marigold depth and normals estimation (#7847) · b3d10d6d
    Anton Obukhov authored
    
    
    * implement marigold depth and normals pipelines in diffusers core
    
    * remove bibtex
    
    * remove deprecations
    
    * remove save_memory argument
    
    * remove validate_vae
    
    * remove config output
    
    * remove batch_size autodetection
    
    * remove presets logic
    move default denoising_steps and processing_resolution into the model config
    make default ensemble_size 1
    
    * remove no_grad
    
    * add fp16 to the example usage
    
    * implement is_matplotlib_available
    use is_matplotlib_available, is_scipy_available for conditional imports in the marigold depth pipeline
    
    * move colormap, visualize_depth, and visualize_normals into export_utils.py
    
    * make the denoising loop more lucid
    fix the outputs to always be 4d tensors or lists of pil images
    support a 4d input_image case
    attempt to support model_cpu_offload_seq
    move check_inputs into a separate function
    change default batch_size to 1, remove any logic to make it bigger implicitly
    
    * style
    
    * rename denoising_steps into num_inference_steps
    
    * rename input_image into image
    
    * rename input_latent into latents
    
    * remove decode_image
    change decode_prediction to use the AutoencoderKL.decode method
    
    * move clean_latent outside of progress_bar
    
    * refactor marigold-reusable image processing bits into MarigoldImageProcessor class
    
    * clean up the usage example docstring
    
    * make ensemble functions members of the pipelines
    
    * add early checks in check_inputs
    rename E into ensemble_size in depth ensembling
    
    * fix vae_scale_factor computation
    
    * better compatibility with torch.compile
    better variable naming
    
    * move export_depth_to_png to export_utils
    
    * remove encode_prediction
    
    * improve visualize_depth and visualize_normals to accept multi-dimensional data and lists
    remove visualization functions from the pipelines
    move exporting depth as 16-bit PNGs functionality from the depth pipeline
    update example docstrings
    
    * do not shortcut vae.config variables
    
    * change all asserts to raise ValueError
    
    * rename output_prediction_type to output_type
    
    * better variable names
    clean up variable deletion code
    
    * better variable names
    
    * pass desc and leave kwargs into the diffusers progress_bar
    implement nested progress bar for images and steps loops
    
    * implement scale_invariant and shift_invariant flags in the ensemble_depth function
    add scale_invariant and shift_invariant flags readout from the model config
    further refactor ensemble_depth
    support ensembling without alignment
    add ensemble_depth docstring
    
    * fix generator device placement checks
    
    * move encode_empty_text body into the pipeline call
    
    * minor empty text encoding simplifications
    
    * adjust pipelines' class docstrings to explain the added construction arguments
    
    * improve the scipy failure condition
    add comments
    improve docstrings
    change the default use_full_z_range to True
    
    * make input image values range check configurable in the preprocessor
    refactor load_image_canonical in preprocessor to reject unknown types and return the image in the expected 4D format of tensor and on right device
    support a list of everything as inputs to the pipeline, change type to PipelineImageInput
    implement a check that all input list elements have the same dimensions
    improve docstrings of pipeline outputs
    remove check_input pipeline argument
    
    * remove forgotten print
    
    * add prediction_type model config
    
    * add uncertainty visualization into export utils
    fix NaN values in normals uncertainties
    
    * change default of output_uncertainty to False
    better handle the case of an attempt to export or visualize none
    
    * fix `output_uncertainty=False`
    
    * remove kwargs
    fix check_inputs according to the new inputs of the pipeline
    
    * rename prepare_latent into prepare_latents as in other pipelines
    annotate prepare_latents in normals pipeline with "Copied from"
    annotate encode_image in normals pipeline with "Copied from"
    
    * move nested-capable `progress_bar` method into the pipelines
    revert the original `progress_bar` method in pipeline_utils
    
    * minor message improvement
    
    * fix cpu offloading
    
    * move colormap, visualize_depth, export_depth_to_16bit_png, visualize_normals, visualize_uncertainty to marigold_image_processing.py
    update example docstrings
    
    * fix missing comma
    
    * change torch.FloatTensor to torch.Tensor
    
    * fix importing of MarigoldImageProcessor
    
    * fix vae offloading
    fix batched image encoding
    remove separate encode_image function and use vae.encode instead
    
    * implement marigold's intial tests
    relax generator checks in line with other pipelines
    implement return_dict __call__ argument in line with other pipelines
    
    * fix num_images computation
    
    * remove MarigoldImageProcessor and outputs from import structure
    update tests
    
    * update docstrings
    
    * update init
    
    * update
    
    * style
    
    * fix
    
    * fix
    
    * up
    
    * up
    
    * up
    
    * add simple test
    
    * up
    
    * update expected np input/output to be channel last
    
    * move expand_tensor_or_array into the MarigoldImageProcessor
    
    * rewrite tests to follow conventions - hardcoded slices instead of image artifacts
    write more smoke tests
    
    * add basic docs.
    
    * add anton's contribution statement
    
    * remove todos.
    
    * fix assertion values for marigold depth slow tests
    
    * fix assertion values for depth normals.
    
    * remove print
    
    * support AutoencoderTiny in the pipelines
    
    * update documentation page
    add Available Pipelines section
    add Available Checkpoints section
    add warning about num_inference_steps
    
    * fix missing import in docstring
    fix wrong value in visualize_depth docstring
    
    * [doc] add marigold to pipelines overview
    
    * [doc] add section "usage examples"
    
    * fix an issue with latents check in the pipelines
    
    * add "Frame-by-frame Video Processing with Consistency" section
    
    * grammarly
    
    * replace tables with images with css-styled images (blindly)
    
    * style
    
    * print
    
    * fix the assertions.
    
    * take from the github runner.
    
    * take the slices from action artifacts
    
    * style.
    
    * update with the slices from the runner.
    
    * remove unnecessary code blocks.
    
    * Revert "[doc] add marigold to pipelines overview"
    
    This reverts commit a505165150afd8dab23c474d1a054ea505a56a5f.
    
    * remove invitation for new modalities
    
    * split out marigold usage examples
    
    * doc cleanup
    
    ---------
    Co-authored-by: default avataryiyixuxu <yixu310@gmail.com>
    Co-authored-by: default avataryiyixuxu <yixu310@gmail,com>
    Co-authored-by: default avatarsayakpaul <spsayakpaul@gmail.com>
    b3d10d6d
__init__.py 22.5 KB