1. 06 May, 2024 1 commit
    • Arthur's avatar
      [`CI update`] Try to use dockers and no cache (#29202) · 307f632b
      Arthur authored
      
      
      * change cis
      
      * nits
      
      * update
      
      * minor updates
      
      * [push-ci-image]
      
      * nit [push-ci-image]
      
      * nitsssss
      
      * [build-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * both
      
      * [push-ci-image]
      
      * this?
      
      * [push-ci-image]
      
      * pypi-kenlm needs g++
      
      * [push-ci-image]
      
      * nit
      
      * more nits [push-ci-image]
      
      * nits [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * add vision
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * add new dummy file but will need to update them [push-ci-image]
      
      * [push-ci-image]
      
      * show package size as well
      
      * [push-ci-image]
      
      * potentially ignore failures
      
      * workflow updates
      
      * nits [push-ci-image]
      
      * [push-ci-image]
      
      * fix consistency
      
      * clean nciida triton
      
      * also show big packages [push-ci-image]
      
      * nit
      
      * update
      
      * another one
      
      * line escape?
      
      * add accelerate [push-ci-image]
      
      * updates [push-ci-image]
      
      * nits to run tests, no push-ci
      
      * try to parse skip reason to make sure nothing is skipped that should no be skippped
      
      * nit?
      
      * always show skipped reasons
      
      * nits
      
      * better parsing of the test outputs
      
      * action="store_true",
      
      * failure on failed
      
      * show matched
      
      * debug
      
      * update short summary with skipped, failed and errors
      
      * nits
      
      * nits
      
      * coolu pdates
      
      * remove docbuilder
      
      * fix
      
      * always run checks
      
      * oups
      
      * nits
      
      * don't error out on library printing
      
      * non zero exi codes
      
      * no warning
      
      * nit
      
      * WAT?
      
      * format nit
      
      * [push-ci-image]
      
      * fail if fail is needed
      
      * [push-ci-image]
      
      * sound file for torch light?
      
      * [push-ci-image]
      
      * order is important [push-ci-image]
      
      * [push-ci-image] reduce even further
      
      * [push-ci-image]
      
      * use pytest rich !
      
      * yes [push-ci-image]
      
      * oupsy
      
      * bring back the full traceback, but pytest rich should help
      
      * nit
      
      * [push-ci-image]
      
      * re run
      
      * nit
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * empty push to trigger
      
      * [push-ci-image]
      
      * nit? [push-ci-image]
      
      * empty
      
      * try to install timm with no deps
      
      * [push-ci-image]
      
      * oups [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image] ?
      
      * [push-ci-image] open ssh client for git checkout fast
      
      * empty for torch light
      
      * updates [push-ci-image]
      
      * nit
      
      * @v4 for checkout
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * fix fetch tests with parallelism
      
      * [push-ci-image]
      
      * more parallelism
      
      * nit
      
      * more nits
      
      * empty to re-trigger
      
      * empty to re-trigger
      
      * split by timing
      
      * did not work with previous commit
      
      * junit.xml
      
      * no path?
      
      * mmm this?
      
      * junitxml format
      
      * split by timing
      
      * nit
      
      * fix junit family
      
      * now we can test if the xunit1 is compatible!
      
      * this?
      
      * fully list tests
      
      * update
      
      * update
      
      * oups
      
      * finally
      
      * use classname
      
      * remove working directory to make sure the path does not interfere
      
      * okay no juni should have the correct path
      
      * name split?
      
      * sort by classname is what make most sense
      
      * some testing
      
      * naem
      
      * oups
      
      * test something fun
      
      * autodetect
      
      * 18?
      
      * nit
      
      * file size?
      
      * uip
      
      * 4 is best
      
      * update to see versions
      
      * better print
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * please install the correct keras version
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * uv is fucking me up
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * nits
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * install issues an pins
      
      * tapas as well
      
      * nits
      
      * more paralellism
      
      * short tb
      
      * soundfile
      
      * soundfile
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * oups
      
      * [push-ci-image]
      
      * fix some things
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * use torch-light for hub
      
      * small git lfs for hub job
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * fix tf tapas
      
      * [push-ci-image]
      
      * nits
      
      * [push-ci-image]
      
      * don't update the test
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * no use them
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * update tf proba
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * woops
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * test with built dockers
      
      * [push-ci-image]
      
      * skip annoying tests
      
      * revert fix copy
      
      * update test values
      
      * update
      
      * last skip and fixup
      
      * nit
      
      * ALL GOOOD
      
      * quality
      
      * Update tests/models/layoutlmv2/test_image_processing_layoutlmv2.py
      
      * Update docker/quality.dockerfile
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * Update src/transformers/models/tapas/modeling_tf_tapas.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * Apply suggestions from code review
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * use torch-speed
      
      * updates
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      * fuck ken-lm [push-ci-image]
      
      * [push-ci-image]
      
      * [push-ci-image]
      
      ---------
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      307f632b
  2. 01 May, 2024 1 commit
  3. 25 Apr, 2024 1 commit
  4. 23 Apr, 2024 1 commit
  5. 20 Mar, 2024 1 commit
  6. 04 Mar, 2024 1 commit
    • NielsRogge's avatar
      Add UDOP (#22940) · 836921fd
      NielsRogge authored
      
      
      * First draft
      
      * More improvements
      
      * More improvements
      
      * More fixes
      
      * Fix copies
      
      * More improvements
      
      * More fixes
      
      * More improvements
      
      * Convert checkpoint
      
      * More improvements, set up tests
      
      * Fix more tests
      
      * Add UdopModel
      
      * More improvements
      
      * Fix equivalence test
      
      * More fixes
      
      * Redesign model
      
      * Extend conversion script
      
      * Use real inputs for conversion script
      
      * Add image processor
      
      * Improve conversion script
      
      * Add UdopTokenizer
      
      * Add fast tokenizer
      
      * Add converter
      
      * Update README's
      
      * Add processor
      
      * Add fully fledged tokenizer
      
      * Add fast tokenizer
      
      * Use processor in conversion script
      
      * Add tokenizer tests
      
      * Fix one more test
      
      * Fix more tests
      
      * Fix tokenizer tests
      
      * Enable fast tokenizer tests
      
      * Fix more tests
      
      * Fix additional_special_tokens of fast tokenizer
      
      * Fix tokenizer tests
      
      * Fix more tests
      
      * Fix equivalence test
      
      * Rename image to pixel_values
      
      * Rename seg_data to bbox
      
      * More renamings
      
      * Remove vis_special_token
      
      * More improvements
      
      * Add docs
      
      * Fix copied from
      
      * Update slow tokenizer
      
      * Update fast tokenizer design
      
      * Make text input optional
      
      * Add first draft of processor tests
      
      * Fix more processor tests
      
      * Fix decoder_start_token_id
      
      * Fix test_initialization
      
      * Add integration test
      
      * More improvements
      
      * Improve processor, add test
      
      * Add more copied from
      
      * Add more copied from
      
      * Add more copied from
      
      * Add more copied from
      
      * Remove print statement
      
      * Update README and auto mapping
      
      * Delete files
      
      * Delete another file
      
      * Remove code
      
      * Fix test
      
      * Fix docs
      
      * Remove asserts
      
      * Add doc tests
      
      * Include UDOP in exotic model tests
      
      * Add expected tesseract decodings
      
      * Add sentencepiece
      
      * Use same design as T5
      
      * Add UdopEncoderModel
      
      * Add UdopEncoderModel to tests
      
      * More fixes
      
      * Fix fast tokenizer
      
      * Fix one more test
      
      * Remove parallelisable attribute
      
      * Fix copies
      
      * Remove legacy file
      
      * Copy from T5Tokenizer
      
      * Fix rebase
      
      * More fixes, copy from T5
      
      * More fixes
      
      * Fix init
      
      * Use ArthurZ/udop for tests
      
      * Make all model tests pass
      
      * Remove UdopForConditionalGeneration from auto mapping
      
      * Fix more tests
      
      * fixups
      
      * more fixups
      
      * fix the tokenizers
      
      * remove un-necessary changes
      
      * nits
      
      * nits
      
      * replace truncate_sequences_boxes with truncate_sequences for fix-copies
      
      * nit current path
      
      * add a test for input ids
      
      * ids that we should get taken from c9f7a32f57440d90ff79890270d376a1cc0acb68
      
      * nits converting
      
      * nits
      
      * apply ruff
      
      * nits
      
      * nits
      
      * style
      
      * fix slow order of addition
      
      * fix udop fast range as well
      
      * fixup
      
      * nits
      
      * Add docstrings
      
      * Fix gradient checkpointing
      
      * Update code examples
      
      * Skip tests
      
      * Update integration test
      
      * Address comment
      
      * Make fixup
      
      * Remove extra ids from tokenizer
      
      * Skip test
      
      * Apply suggestions from code review
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      
      * Update year
      
      * Address comment
      
      * Address more comments
      
      * Address comments
      
      * Add copied from
      
      * Update CI
      
      * Rename script
      
      * Update model id
      
      * Add AddedToken, skip tests
      
      * Update CI
      
      * Fix doc tests
      
      * Do not use Tesseract for the doc tests
      
      * Remove kwargs
      
      * Add original inputs
      
      * Update casting
      
      * Fix doc test
      
      * Update question
      
      * Update question
      
      * Use LayoutLMv3ImageProcessor
      
      * Update organization
      
      * Improve docs
      
      * Update forward signature
      
      * Make images optional
      
      * Remove deprecated device argument
      
      * Add comment, add add_prefix_space
      
      * More improvements
      
      * Remove kwargs
      
      ---------
      Co-authored-by: default avatarArthurZucker <arthur.zucker@gmail.com>
      Co-authored-by: default avatarArthur <48595927+ArthurZucker@users.noreply.github.com>
      836921fd
  7. 20 Feb, 2024 1 commit
  8. 19 Feb, 2024 1 commit
  9. 07 Feb, 2024 1 commit
  10. 06 Feb, 2024 2 commits
  11. 02 Feb, 2024 2 commits
  12. 30 Jan, 2024 2 commits
  13. 10 Jan, 2024 1 commit
  14. 03 Jan, 2024 1 commit
    • Connor Henderson's avatar
      Add FastSpeech2Conformer (#23439) · d83ff5ee
      Connor Henderson authored
      * start - docs, SpeechT5 copy and rename
      
      * add relevant code from FastSpeech2 draft, have tests pass
      
      * make it an actual conformer, demo ex.
      
      * matching inference with original repo, includes debug code
      
      * refactor nn.Sequentials, start more desc. var names
      
      * more renaming
      
      * more renaming
      
      * vocoder scratchwork
      
      * matching vocoder outputs
      
      * hifigan vocoder conversion script
      
      * convert model script, rename some config vars
      
      * replace postnet with speecht5's implementation
      
      * passing common tests, file cleanup
      
      * expand testing, add output hidden states and attention
      
      * tokenizer + passing tokenizer tests
      
      * variety of updates and tests
      
      * g2p_en pckg setup
      
      * import structure edits
      
      * docstrings and cleanup
      
      * repo consistency
      
      * deps
      
      * small cleanup
      
      * forward signature param order
      
      * address comments except for masks and labels
      
      * address comments on attention_mask and labels
      
      * address second round of comments
      
      * remove old unneeded line
      
      * address comments part 1
      
      * address comments pt 2
      
      * rename auto mapping
      
      * fixes for failing tests
      
      * address comments part 3 (bart-like, train loss)
      
      * make style
      
      * pass config where possible
      
      * add forward method + tests to WithHifiGan model
      
      * make style
      
      * address arg passing and generate_speech comments
      
      * address Arthur comments
      
      * address Arthur comments pt2
      
      * lint  changes
      
      * Sanchit comment
      
      * add g2p-en to doctest deps
      
      * move up self.encoder
      
      * onnx compatible tensor method
      
      * fix is symbolic
      
      * fix paper url
      
      * move models to espnet org
      
      * make style
      
      * make fix-copies
      
      * update docstring
      
      * Arthur comments
      
      * update docstring w/ new updates
      
      * add model architecture images
      
      * header size
      
      * md wording update
      
      * make style
      d83ff5ee
  15. 16 Nov, 2023 1 commit
    • Arthur's avatar
      [`Styling`] stylify using ruff (#27144) · 651408a0
      Arthur authored
      
      
      * try to stylify using ruff
      
      * might need to remove these changes?
      
      * use ruf format andruff check
      
      * use isinstance instead of type comparision
      
      * use # fmt: skip
      
      * use # fmt: skip
      
      * nits
      
      * soem styling changes
      
      * update ci job
      
      * nits isinstance
      
      * more files update
      
      * nits
      
      * more nits
      
      * small nits
      
      * check and format
      
      * revert wrong changes
      
      * actually use formatter instead of checker
      
      * nits
      
      * well docbuilder is overwriting this commit
      
      * revert notebook changes
      
      * try to nuke docbuilder
      
      * style
      
      * fix feature exrtaction test
      
      * remve `indent-width = 4`
      
      * fixup
      
      * more nits
      
      * update the ruff version that we use
      
      * style
      
      * nuke docbuilder styling
      
      * leve the print for detected changes
      
      * nits
      
      * Remove file I/O
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      
      * style
      
      * nits
      
      * revert notebook changes
      
      * Add # fmt skip when possible
      
      * Add # fmt skip when possible
      
      * Fix
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * More `  # fmt: skip` usage
      
      * NIts
      
      * more fixes
      
      * fix tapas
      
      * Another way to skip
      
      * Recommended way
      
      * Fix two more fiels
      
      * Remove asynch
      Remove asynch
      
      ---------
      Co-authored-by: default avatarcharliermarsh <charlie.r.marsh@gmail.com>
      651408a0
  16. 10 Nov, 2023 1 commit
  17. 09 Nov, 2023 2 commits
  18. 23 Oct, 2023 1 commit
  19. 18 Oct, 2023 1 commit
    • Arthur's avatar
      [`Tokenizer`] Fix slow and fast serialization (#26570) · ef7e9369
      Arthur authored
      * fix
      
      * last attempt
      
      * current work
      
      * fix forward compatibility
      
      * save all special tokens
      
      * current state
      
      * revert additional changes
      
      * updates
      
      * remove tokenizer.model
      
      * add a test and the fix
      
      * nit
      
      * revert one more break
      
      * fix typefield issue
      
      * quality
      
      * more tests
      
      * fix fields for FC
      
      * more nits?
      
      * new additional changes
      
      * how
      
      * some updates
      
      * simplify all
      
      * more nits
      
      * revert some things to original
      
      * nice
      
      * nits
      
      * a small hack
      
      * more nits
      
      * ahhaha
      
      * fixup
      
      * update
      
      * make test run on ci
      
      * use subtesting
      
      * update
      
      * Update .circleci/create_circleci_config.py
      
      * updates
      
      * fixup
      
      * nits
      
      * replace typo
      
      * fix the test
      
      * nits
      
      * update
      
      * None max dif pls
      
      * a partial fix
      
      * had to revert one thing
      
      * test the fast
      
      * updates
      
      * fixup
      
      * and more nits
      
      * more fixes
      
      * update
      
      * Oupsy 馃憗
      
      
      
      * nits
      
      * fix marian
      
      * on our way to heaven
      
      * Update src/transformers/models/t5/tokenization_t5.py
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      
      * fixup
      
      * Update src/transformers/tokenization_utils_fast.py
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * Update src/transformers/tokenization_utils_base.py
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      
      * fix phobert
      
      * skip some things, test more
      
      * nits
      
      * fixup
      
      * fix deberta
      
      * update
      
      * update
      
      * more updates
      
      * skip one test
      
      * more updates
      
      * fix camembert
      
      * can't test this one
      
      * more good fixes
      
      * kind of a major update
      
      - seperate what is only done in fast in fast init and refactor
      - add_token(AddedToken(..., speicla = True)) ignores it in fast
      - better loading
      
      * fixup
      
      * more fixups
      
      * fix pegasus and mpnet
      
      * remove skipped tests
      
      * fix phoneme tokenizer if self.verbose
      
      * fix individual models
      
      * update common tests
      
      * update testing files
      
      * all over again
      
      * nits
      
      * skip test for markup lm
      
      * fixups
      
      * fix order of addition in fast by sorting the added tokens decoder
      
      * proper defaults for deberta
      
      * correct default for fnet
      
      * nits on add tokens, string initialized to special if special
      
      * skip irrelevant herbert tests
      
      * main fixes
      
      * update test added_tokens_serialization
      
      * the fix for bart like models and class instanciating
      
      * update bart
      
      * nit!
      
      * update idefix test
      
      * fix whisper!
      
      * some fixup
      
      * fixups
      
      * revert some of the wrong chanegs
      
      * fixup
      
      * fixup
      
      * skip marian
      
      * skip the correct tests
      
      * skip for tf and flax as well
      
      ---------
      Co-authored-by: default avatarLysandre Debut <hi@lysand.re>
      Co-authored-by: default avatarLeo Tronchon <leo.tronchon@gmail.com>
      ef7e9369
  20. 09 Oct, 2023 1 commit
  21. 26 Sep, 2023 1 commit
    • NielsRogge's avatar
      Add Nougat (#25942) · ace74d16
      NielsRogge authored
      
      
      * Add conversion script
      
      * Add NougatImageProcessor
      
      * Add crop margin
      
      * More improvements
      
      * Add docs, READMEs
      
      * Remove print statements
      
      * Include model_max_length
      
      * Add NougatTokenizerFast
      
      * Fix imports
      
      * Improve postprocessing
      
      * Improve image processor
      
      * Fix image processor
      
      * Improve normalize method
      
      * More improvements
      
      * More improvements
      
      * Add processor, improve docs
      
      * Simplify fast tokenizer
      
      * Remove test file
      
      * Fix docstrings
      
      * Use NougatProcessor in conversion script
      
      * Add is_levensthein_available
      
      * Add tokenizer tests
      
      * More improvements
      
      * Use numpy instead of opencv
      
      * Add is_cv2_available
      
      * Fix cv2_available
      
      * Add is_nltk_available
      
      * Add image processor tests, improve crop_margin
      
      * Add integration tests
      
      * Improve integration test
      
      * Use do_rescale instead of hacks, thanks Amy
      
      * Remove random_padding
      
      * Address comments
      
      * Address more comments
      
      * Add import
      
      * Address more comments
      
      * Address more comments
      
      * Address comment
      
      * Address comment
      
      * Set max_model_input_sizes
      
      * Add tests
      
      * Add requires_backends
      
      * Add Nougat to exotic tests
      
      * Use to_pil_image
      
      * Address comment regarding nltk
      
      * Add NLTK
      
      * Improve variable names, integration test
      
      * Add test
      
      * refactor, document, and test regexes
      
      * remove named capture groups, add comments
      
      * format
      
      * add non-markdown fixed tokenization
      
      * format
      
      * correct flakyness of args parse
      
      * add regex comments
      
      * test functionalities for crop_image, align long axis and expected output
      
      * add regex tests
      
      * remove cv2 dependency
      
      * test crop_margin equality between cv2 and python
      
      * refactor table regexes to markdown
      
      add newline
      
      * change print to log, improve doc
      
      * fix high count tables correction
      
      * address PR comments: naming, linting, asserts
      
      * Address comments
      
      * Add copied from
      
      * Update conversion script
      
      * Update conversion script to convert both small and base versions
      
      * Add inference example
      
      * Add more info
      
      * Fix style
      
      * Add require annotators to test
      
      * Define all keyword arguments explicitly
      
      * Move cv2 annotator
      
      * Add tokenizer init method
      
      * Transfer checkpoints
      
      * Add reference to Donut
      
      * Address comments
      
      * Skip test
      
      * Remove cv2 method
      
      * Add copied from statements
      
      * Use cached_property
      
      * Fix docstring
      
      * Add file to not doctested
      
      ---------
      Co-authored-by: default avatarPablo Montalvo <pablo.montalvo.leroux@gmail.com>
      ace74d16
  22. 22 Sep, 2023 1 commit
  23. 19 Sep, 2023 1 commit
  24. 07 Sep, 2023 1 commit
  25. 05 Sep, 2023 2 commits
  26. 30 Aug, 2023 1 commit
  27. 11 Aug, 2023 2 commits
  28. 08 Aug, 2023 2 commits
  29. 02 Aug, 2023 2 commits
  30. 18 Jul, 2023 2 commits
  31. 17 Jul, 2023 1 commit