• NielsRogge's avatar
    Add Perceiver IO (#14487) · 65b20b73
    NielsRogge authored
    * First draft
    
    * Style and remove mlm
    
    * Make forward pass work
    
    * More improvements
    
    * More improvements
    
    * Fix bug
    
    * More improvements
    
    * More improvements
    
    * Add PerceiverTokenizer first draft
    
    * Improve conversion script
    
    * More improvements
    
    * Make conversion script work for the encoder
    
    * Make conversion script work with local pickle files
    
    * Style & quality, fix-copies
    
    * Add dummy input to conversion script
    
    * Add absolute position embeddings to TextPreProcessor
    
    * Make forward pass of encoder work
    
    * More improvements
    
    * Move text preprocessor to separate script
    
    * More improvements
    
    * More improvements
    
    * Add post processor
    
    * Make MLM model work
    
    * Style
    
    * Add PerceiverForMaskedLM
    
    * Add PerceiverImagePreprocessor
    
    * Make style
    
    * Make PerceiverForImageClassification work
    
    * More improvements
    
    * More improvements
    
    * Use tokenizer in conversion script
    
    * Use PerceiverForMaskedLM in conversion script
    
    * Define custom PerceiverModelOutput
    
    * Improve PerceiverAttention to make it work for both MLM and image classification
    
    * More improvements
    
    * More improvements
    
    * More improvements to the conversion script
    
    * Make conversion script work for both MLM and image classification
    
    * Add PerceiverFeatureExtractor
    
    * More improvements
    
    * Style and quality
    
    * Add center cropping
    
    * Fix bug
    
    * Small fix
    
    * Add print statement
    
    * Fix bug in image preprocessor
    
    * Fix bug with conversion script
    
    * Make output position embeddings an nn.Parameter layer instead of nn.Embedding
    
    * Comment out print statements
    
    * Add position encoding classes
    
    * More improvements
    
    * Use position_encoding_kwargs
    
    * Add PerceiverForImageClassificationFourier
    
    * Make style & quality
    
    * Add PerceiverForImageClassificationConvProcessing
    
    * Style & quality
    
    * Add flow model
    
    * Move processors to modeling file
    
    * Make position encodings modular
    
    * Make basic decoder use modular position encodings
    
    * Add PerceiverForOpticalFlow to conversion script
    
    * Add AudioPreprocessor
    
    * Make it possible for the basic decoder to use Fourier position embeddings
    
    * Add PerceiverForMultimodalAutoencoding
    
    * Improve model for optical flow
    
    * Improve _build_network_inputs method
    
    * Add print statement
    
    * Fix device issue
    
    * Fix device of Fourier embeddings
    
    * Add print statements for debugging
    
    * Add another print statement
    
    * Add another print statement
    
    * Add another print statement
    
    * Add another print statement
    
    * Improve PerceiverAudioPreprocessor
    
    * Improve conversion script for multimodal modal
    
    * More improvements
    
    * More improvements
    
    * Improve multimodal model
    
    * Make forward pass multimodal model work
    
    * More improvements
    
    * Improve tests
    
    * Fix some more tests
    
    * Add output dataclasses
    
    * Make more tests pass
    
    * Add print statements for debuggin
    
    * Add tests for image classification
    
    * Add PerceiverClassifierOutput
    
    * More improvements
    
    * Make more tests pass for the optical flow model
    
    * Make style & quality
    
    * Small improvements
    
    * Don't support training for optical flow model for now
    
    * Fix _prepare_for_class for tests
    
    * Make more tests pass, add some docs
    
    * Add multimodal model to tests
    
    * Minor fixes
    
    * Fix tests
    
    * Improve conversion script
    
    * Make fixup
    
    * Remove pos_dim argument
    
    * Fix device issue
    
    * Potential fix for OOM
    
    * Revert previous commit
    
    * Fix test_initialization
    
    * Add print statements for debugging
    
    * Fix print statement
    
    * Add print statement
    
    * Add print statement
    
    * Add print statement
    
    * Add print statement
    
    * Add print statement
    
    * Add print statement
    
    * Remove need for output_shape
    
    * Comment out output_shape
    
    * Remove unnecessary code
    
    * Improve docs
    
    * Fix make fixup
    
    * Remove PerceiverTextProcessor from init
    
    * Improve docs
    
    * Small improvement
    
    * Apply first batch of suggestions from code review
    
    * Apply more suggestions from code review
    
    * Update docstrings
    
    * Define dicts beforehand for readability
    
    * Rename task to architecture in conversion script, include PerceiverModel in tests
    
    * Add print statements for debugging
    
    * Fix tests on GPU
    
    * Remove preprocessors, postprocessors and decoders from main init
    
    * Add integration test
    
    * Fix docs
    
    * Replace einops by torch
    
    * Update for new docs frontend
    
    * Rename PerceiverForImageClassification
    
    * Improve docs
    
    * Improve docs
    
    * Improve docs of PerceiverModel
    
    * Fix some more tests
    
    * Improve center_crop
    
    * Add PerceiverForSequenceClassification
    
    * Small improvements
    
    * Fix tests
    
    * Add integration test for optical flow model
    
    * Clean up
    
    * Add tests for tokenizer
    
    * Fix tokenizer by adding special tokens properly
    
    * Fix CI
    65b20b73
README.md 46.1 KB