Add MarkupLM (#19198)
* First draft * Make basic test work * Fix most tokenizer tests * More improvements * Make more tests pass * Fix more tests * Fix some code quality * Improve truncation * Implement feature extractor * Improve feature extractor and add tests * Improve feature extractor tests * Fix pair_input test partly * Add fast tokenizer * Improve implementation * Fix rebase * Fix rebase * Fix most of the tokenizer tests. * propose solution for fast * add: integration test for fasttokenizer, warning for decode, fix template in slow tokenizer * add: modify markuplmconverter * add: some modify on converter and tokenizerfast * Fix style, copies * Make fixup * Update tokenization_markuplm.py * Update test_tokenization_markuplm.py * Update markuplm related * Improve processor, add integration test * Add processor test file * Improve processor * Improve processor tests * Fix more processor tests * Fix processor t...
Showing
This diff is collapsed.
This diff is collapsed.
Please register or sign in to comment