• NielsRogge's avatar
    Add TrOCR + VisionEncoderDecoderModel (#13874) · 408b2d2b
    NielsRogge authored
    * First draft
    
    * Update self-attention of RoBERTa as proposition
    
    * Improve conversion script
    
    * Add TrOCR decoder-only model
    
    * More improvements
    
    * Make forward pass with pretrained weights work
    
    * More improvements
    
    * Some more improvements
    
    * More improvements
    
    * Make conversion work
    
    * Clean up print statements
    
    * Add documentation, processor
    
    * Add test files
    
    * Small improvements
    
    * Some more improvements
    
    * Make fix-copies, improve docs
    
    * Make all vision encoder decoder model tests pass
    
    * Make conversion script support other models
    
    * Update URL for OCR image
    
    * Update conversion script
    
    * Fix style & quality
    
    * Add support for the large-printed model
    
    * Fix some issues
    
    * Add print statement for debugging
    
    * Add print statements for debugging
    
    * Make possible fix for sinusoidal embedding
    
    * Further debugging
    
    * Potential fix v2
    
    * Add more print statements for debugging
    
    * Add more print statements for debugging
    
    * Deubg more
    
    * Comment out print statements
    
    * Make conversion of large printed model possible, address review comments
    
    * Make it possible to convert the stage1 checkpoints
    
    * Clean up code, apply suggestions from code review
    
    * Apply suggestions from code review, use Microsoft models in tests
    
    * Rename encoder_hidden_size to cross_attention_hidden_size
    
    * Improve docs
    408b2d2b
README.md 41.2 KB