"...docs/git@developer.sourcefind.cn:wangkx1/ollama_dcu.git" did not exist on "950050465b5da2c93a70f0099c05170eec0e2c06"
  • NielsRogge's avatar
    Add LayoutLMv2 + LayoutXLM (#12604) · b6ddb08a
    NielsRogge authored
    
    
    * First commit
    
    * Make style
    
    * Fix dummy objects
    
    * Add Detectron2 config
    
    * Add LayoutLMv2 pooler
    
    * More improvements, add documentation
    
    * More improvements
    
    * Add model tests
    
    * Add clarification regarding image input
    
    * Improve integration test
    
    * Fix bug
    
    * Fix another bug
    
    * Fix another bug
    
    * Fix another bug
    
    * More improvements
    
    * Make more tests pass
    
    * Make more tests pass
    
    * Improve integration test
    
    * Remove gradient checkpointing and add head masking
    
    * Add integration test
    
    * Add LayoutLMv2ForSequenceClassification to the tests
    
    * Add LayoutLMv2ForQuestionAnswering
    
    * More improvements
    
    * More improvements
    
    * Small improvements
    
    * Fix _LazyModule
    
    * Fix fast tokenizer
    
    * Move sync_batch_norm to a separate method
    
    * Replace dummies by requires_backends
    
    * Move calculation of visual bounding boxes to separate method + update README
    
    * Add models to main init
    
    * First draft
    
    * More improvements
    
    * More improvements
    
    * More improvements
    
    * More improvements
    
    * More improvements
    
    * Remove is_split_into_words
    
    * More improvements
    
    * Simply tesseract - no use of pandas anymore
    
    * Add LayoutLMv2Processor
    
    * Update is_pytesseract_available
    
    * Fix bugs
    
    * Improve feature extractor
    
    * Fix bug
    
    * Add print statement
    
    * Add truncation of bounding boxes
    
    * Add tests for LayoutLMv2FeatureExtractor and LayoutLMv2Tokenizer
    
    * Improve tokenizer tests
    
    * Make more tokenizer tests pass
    
    * Make more tests pass, add integration tests
    
    * Finish integration tests
    
    * More improvements
    
    * More improvements - update API of the tokenizer
    
    * More improvements
    
    * Remove support for VQA training
    
    * Remove some files
    
    * Improve feature extractor
    
    * Improve documentation and one more tokenizer test
    
    * Make quality and small docs improvements
    
    * Add batched tests for LayoutLMv2Processor, remove fast tokenizer
    
    * Add truncation of labels
    
    * Apply suggestions from code review
    
    * Improve processor tests
    
    * Fix failing tests and add suggestion from code review
    
    * Fix tokenizer test
    
    * Add detectron2 CI job
    
    * Simplify CI job
    
    * Comment out non-detectron2 jobs and specify number of processes
    
    * Add pip install torchvision
    
    * Add durations to see which tests are slow
    
    * Fix tokenizer test and make model tests smaller
    
    * Frist draft
    
    * Use setattr
    
    * Possible fix
    
    * Proposal with configuration
    
    * First draft of fast tokenizer
    
    * More improvements
    
    * Enable fast tokenizer tests
    
    * Make more tests pass
    
    * Make more tests pass
    
    * More improvements
    
    * Addd padding to fast tokenizer
    
    * Mkae more tests pass
    
    * Make more tests pass
    
    * Make all tests pass for fast tokenizer
    
    * Make fast tokenizer support overflowing boxes and labels
    
    * Add support for overflowing_labels to slow tokenizer
    
    * Add support for fast tokenizer to the processor
    
    * Update processor tests for both slow and fast tokenizers
    
    * Add head models to model mappings
    
    * Make style & quality
    
    * Remove Detectron2 config file
    
    * Add configurable option to label all subwords
    
    * Fix test
    
    * Skip visual segment embeddings in test
    
    * Use ResNet-18 backbone in tests instead of ResNet-101
    
    * Proposal
    
    * Re-enable all jobs on CI
    
    * Fix installation of tesseract
    
    * Fix failing test
    
    * Fix index table
    
    * Add LayoutXLM doc page, first draft of code examples
    
    * Improve documentation a lot
    
    * Update expected boxes for Tesseract 4.0.0 beta
    
    * Use offsets to create labels instead of checking if they start with ##
    
    * Update expected boxes for Tesseract 4.1.1
    
    * Fix conflict
    
    * Make variable names cleaner, add docstring, add link to notebooks
    
    * Revert "Fix conflict"
    
    This reverts commit a9b46ce9afe47ebfcfe7b45e6a121d49e74ef2c5.
    
    * Revert to make integration test pass
    
    * Apply suggestions from @LysandreJik's review
    
    * Address @patrickvonplaten's comments
    
    * Remove fixtures DocVQA in favor of dataset on the hub
    Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
    b6ddb08a
index.rst 44.6 KB