1. 17 Nov, 2021 3 commits
    • Lysandre's avatar
      Docs for version v4.12.5 · c6c07554
      Lysandre authored
      c6c07554
    • NielsRogge's avatar
      Improve semantic segmentation models (#14355) · a2864a50
      NielsRogge authored
      * Improve tests
      
      * Improve documentation
      
      * Add ignore_index attribute
      
      * Add semantic_ignore_index to BEiT model
      
      * Add segmentation maps argument to BEiTFeatureExtractor
      
      * Simplify SegformerFeatureExtractor and corresponding tests
      
      * Improve tests
      
      * Apply suggestions from code review
      
      * Minor docs improvements
      
      * Streamline segmentation map tests of SegFormer and BEiT
      
      * Improve reduce_labels docs and test
      
      * Fix code quality
      
      * Fix code quality again
      a2864a50
    • Patrick von Platen's avatar
      [Wav2Vec2] Add New Wav2Vec2 Translation (#14392) · 700a748f
      Patrick von Platen authored
      * add new wav2vec2 translation
      
      * correct
      
      * up
      
      * add tests
      
      * correct end copy
      
      * correct more
      
      * up
      
      * correct unispeech sat
      
      * finish
      
      * finalize
      
      * finish
      
      * up
      700a748f
  2. 16 Nov, 2021 5 commits
  3. 15 Nov, 2021 8 commits
  4. 14 Nov, 2021 1 commit
  5. 13 Nov, 2021 2 commits
  6. 12 Nov, 2021 4 commits
    • Li-Huai (Allan) Lin's avatar
      Use `AlbertConverter` for FNet instead of using FNet's own converter (#14365) · 280a811e
      Li-Huai (Allan) Lin authored
      * Add normalizer to FNetConverter
      
      * Style
      
      * Directly use AlbertConverter
      280a811e
    • Patrick von Platen's avatar
      [Wav2Vec2 Example] Improve fine-tuning script (#14373) · 55f49c5f
      Patrick von Platen authored
      * improve some stuff
      
      * finish
      
      * correct last
      55f49c5f
    • Suraj Patil's avatar
      fix docs (#14377) · 21546e59
      Suraj Patil authored
      21546e59
    • Nicolas Patry's avatar
      Adding support for raw python `generator` in addition to `Dataset` for pipelines (#14352) · ed5d1551
      Nicolas Patry authored
      * Adding support for raw python `generator` in addition to `Dataset`
      
      The main goal is to ease the create of streaming data to the pipe.
      
      `Dataset` is more involved and pytorch specific.
      
      This PR, provides a way to use a python iterator too.
      This enabled #14250 but can be proposed as a standalone PR.
      
      ```python
      from transformers import pipeline
      
      def read_data(filename):
          with open(filename, 'r') as f:
              for line in f:
                  yield f
      
      pipe = pipeline("text-classification")
      for classified in pipe(read_data("large_file.txt")):
          print("Success ! ", classified)
      ```
      
      The main caveat of this, is the interaction with `DataLoader` with
      `num_workers>1`. When you have multiple workers, each receive a copy
      of the generator (like `IterableDataset`). That means the naive Iterator
      will fail since all workers iterate on all items of the generator.
      
      There are ways to do clever "skipping", but it could be bad still
      because all workers still do have to pass through all items of the
      generator (they just ignore items they don't handle), depending on
      the case it might be bad.
      
      Using `num_workers=1` is the simplest fix and if the cost of loading
      your data is small enough should be good enough. In the above example
      trying to do smart tricks to skip some lines is unlikely to be a net
      positive for instance.
      
      If there are better ways to do "jumps" on some data, then using
      `Dataset` is more advised (since then differents workers can just jump
      themselves).
      
      * Adding iterator support for `tf` too.
      ed5d1551
  7. 11 Nov, 2021 7 commits
  8. 10 Nov, 2021 6 commits
  9. 09 Nov, 2021 4 commits