1. 11 Oct, 2022 1 commit
    • Arthur's avatar
      Fix whisper for `pipeline` (#19482) · b722a6be
      Arthur authored
      * update feature extractor params
      
      * update attention mask handling
      
      * fix doc and pipeline test
      
      * add warning when skipping test
      
      * add whisper translation and transcription test
      
      * fix build doc test
      b722a6be
  2. 07 Oct, 2022 1 commit
    • Sylvain Gugger's avatar
      Rework pipeline tests (#19366) · 9ac586b3
      Sylvain Gugger authored
      * Rework pipeline tests
      
      * Try to fix Flax tests
      
      * Try to put it before
      
      * Use a new decorator instead
      
      * Remove ignore marker since it doesn't work
      
      * Filter pipeline tests
      
      * Woopsie
      
      * Use the fitlered list
      
      * Clean up and fake modif
      
      * Remove init
      
      * Revert fake modif
      9ac586b3
  3. 05 Oct, 2022 1 commit
  4. 06 Sep, 2022 1 commit
  5. 10 Aug, 2022 1 commit
  6. 05 Aug, 2022 1 commit
  7. 19 Jul, 2022 1 commit
  8. 01 Jul, 2022 1 commit
  9. 30 Jun, 2022 2 commits
  10. 12 May, 2022 1 commit
  11. 05 May, 2022 1 commit
  12. 04 Mar, 2022 1 commit
  13. 23 Feb, 2022 2 commits
    • Lysandre Debut's avatar
      [Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41
      Lysandre Debut authored
      
      
      * Per-folder tests reorganization
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarStas Bekman <stas@stason.org>
      29c10a41
    • Nicolas Patry's avatar
      Enable `image-segmentation` on `AutoModelForSemanticSegmentation` (#15647) · 9e71d464
      Nicolas Patry authored
      * Enabling Beit SegFormer to `image-segmentation`.
      
      * Fixing the score.
      
      * Fix import ?
      
      * Missing in type hint.
      
      * Multiple test fixes:
      
      - Add `raw_image` support. It should be the default IMHO since in Python
        world it doesn't make any sense to base64 encode the image (Sorry
        @mishig, didn't catch that in my review). I really think we should
        consider breaking BC here.
      - Add support for Segformer tiny test (needed
        `SegformerModelTester.get_config` to enable TinyConfig
        @NielsRogge)
      - Add the check that `batch_size` works correctly on that pipeline.
        Uncovered that it doesn't for Detr, which IMO is OK since images
        after `feature_extractor` don't have the same size. Comment should
        explain.
      
      * Type hint as a string.
      
      * Make fixup + update black.
      
      * torch+vision protections.
      
      * Don't use torchvision, use F.interpolate instead (no new dep).
      
      * Last fixes for Segformer.
      
      * Update test to reflect new image (which was broken)
      
      * Update tests.
      
      * Major BC modification:
      
      - Removed the string compressed PNG string, that's a job for users
      `transformers` stays in python land.
      - Removed the `score` for semantic segmentation. It has hardly a meaning
        on its own in this context.
      - Don't include the grayscale with logits for now (which could enable
        users to get a sense of confidence). Might be done later.
      - Don't include the surface of the mask (could be used for sorting by
        users, to filter out small masks). It's already calculable, and
        it's easier to add later, than to add now and break later if we need.
      
      * `make fixup`.
      
      * Small changes.
      
      * Rebase + doc fixup.
      9e71d464
  14. 05 Jan, 2022 1 commit
  15. 04 Jan, 2022 1 commit
    • Nicolas Patry's avatar
      Hotfix `chunk_length_s` instead of `_ms`. (#15029) · 19d37c2d
      Nicolas Patry authored
      * Hotfix `chunk_length_s` instead of `_ms`.
      
      * Adding fix of `pad_token` which should be last/previous token for CTC
      
      proper decoding
      
      * Fixing ChunkPipeline unwrapping.
      
      * Adding a PackIterator specific test.
      19d37c2d
  16. 27 Dec, 2021 1 commit
    • Nicolas Patry's avatar
      ChunkPipeline (batch_size enabled on `zero-cls` and `qa` pipelines. (#14225) · b058490c
      Nicolas Patry authored
      
      
      * Pipeline chunks.
      
      * Batching for Chunking pipelines ?
      
      * Batching for `question-answering` and `zero-shot-cls`.
      
      * Fixing for FNet.
      
      * Making ASR a chunk pipeline.
      
      * Chunking ASR API.
      
      * doc style.
      
      * Fixing ASR test.
      
      * Fixing QA eror (p_mask, padding is 1, not 0).
      
      * Enable both vad and simple chunking.
      
      * Max length for vad.
      
      * remove inference mode, crashing on s2t.
      
      * Revert ChunkPipeline for ASRpipeline.
      
      Too many knobs for simple integration within the pipeline, better stick
      to external convenience functions instead, more control to be had,
      simpler pipeline and also easier to replace with other things later.
      
      * Drop necessity for PT for these.
      
      * Enabling generators.
      
      * Add mic + cleanup.
      
      * Typo.
      
      * Typo2.
      
      * Remove ASR work, it does not belong in this PR anymore.
      
      * Update src/transformers/pipelines/pt_utils.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/pipelines/zero_shot_classification.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Adding many comments.
      
      * Doc quality.
      
      * `hidden_states` handling.
      
      * Adding doc.
      
      * Bad rebase.
      
      * Autofixing docs.
      
      * Fixing CRITICAL bug in the new Zerocls pipeline.
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      b058490c
  17. 14 Dec, 2021 1 commit
    • Nicolas Patry's avatar
      Fixing tests for Perceiver (#14739) · 546a91ab
      Nicolas Patry authored
      * Adding some slow test to check for perceiver at least from a high level.
      
      * Re-enabling fast tests for Perceiver ImageClassification.
      
      * Perceiver might try to run without Tokenizer (Fast doesn't exist) and
      with FeatureExtractor some text only pipelines.
      
      * Oops.
      
      * Adding a comment for `update_config_with_model_class`.
      
      * Remove `model_architecture` to get `tiny_config`.
      
      * Finalize rebase.
      
      * Smarter way to handle undefined FastTokenizer.
      
      * Remove old code.
      
      * Addressing some nits.
      
      * Don't instantiate `None`.
      546a91ab
  18. 13 Dec, 2021 1 commit
    • Lysandre Debut's avatar
      Fixing tests for Perceiver (#14745) · 3d66146a
      Lysandre Debut authored
      
      
      - Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for
      langage, which cannot load a FeatureExtractor so current logic fails).
      - Add a safeguard to not run tests when `tokenizer_class` or
      `feature_extractor_class` **are** defined, but cannot be loaded
      This happens for Perceiver for the "FastTokenizer" (which doesn't exist
      so None) and FeatureExtractor (which does exist but cannot be loaded
      because the checkpoint doesn't define one which is reasonable for the
      said checkpoint)
      - Added `get_vocab` function to `PerceiverTokenizer` since it is used by
      `fill-mask` pipeline when the argument `targets` is used to narrow a
      subset of possible values.
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      3d66146a
  19. 08 Dec, 2021 1 commit
  20. 22 Nov, 2021 1 commit
  21. 19 Nov, 2021 1 commit
  22. 12 Nov, 2021 1 commit
    • Nicolas Patry's avatar
      Adding support for raw python `generator` in addition to `Dataset` for pipelines (#14352) · ed5d1551
      Nicolas Patry authored
      * Adding support for raw python `generator` in addition to `Dataset`
      
      The main goal is to ease the create of streaming data to the pipe.
      
      `Dataset` is more involved and pytorch specific.
      
      This PR, provides a way to use a python iterator too.
      This enabled #14250 but can be proposed as a standalone PR.
      
      ```python
      from transformers import pipeline
      
      def read_data(filename):
          with open(filename, 'r') as f:
              for line in f:
                  yield f
      
      pipe = pipeline("text-classification")
      for classified in pipe(read_data("large_file.txt")):
          print("Success ! ", classified)
      ```
      
      The main caveat of this, is the interaction with `DataLoader` with
      `num_workers>1`. When you have multiple workers, each receive a copy
      of the generator (like `IterableDataset`). That means the naive Iterator
      will fail since all workers iterate on all items of the generator.
      
      There are ways to do clever "skipping", but it could be bad still
      because all workers still do have to pass through all items of the
      generator (they just ignore items they don't handle), depending on
      the case it might be bad.
      
      Using `num_workers=1` is the simplest fix and if the cost of loading
      your data is small enough should be good enough. In the above example
      trying to do smart tricks to skip some lines is unlikely to be a net
      positive for instance.
      
      If there are better ways to do "jumps" on some data, then using
      `Dataset` is more advised (since then differents workers can just jump
      themselves).
      
      * Adding iterator support for `tf` too.
      ed5d1551
  23. 10 Nov, 2021 1 commit
  24. 03 Nov, 2021 1 commit
  25. 29 Oct, 2021 3 commits
  26. 14 Oct, 2021 1 commit
  27. 10 Sep, 2021 1 commit
    • Nicolas Patry's avatar
      [Large PR] Entire rework of pipelines. (#13308) · c63fcabf
      Nicolas Patry authored
      
      
      * Enabling dataset iteration on pipelines.
      
      Enabling dataset iteration on pipelines.
      
      Unifying parameters under `set_parameters` function.
      
      Small fix.
      
      Last fixes after rebase
      
      Remove print.
      
      Fixing text2text `generate_kwargs`
      
      No more `self.max_length`.
      
      Fixing tf only conversational.
      
      Consistency in start/stop index over TF/PT.
      
      Speeding up drastically on TF (nasty bug where max_length would increase
      a ton.)
      
      Adding test for support for non fast tokenizers.
      
      Fixign GPU usage on zero-shot.
      
      Fix working on Tf.
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Small cleanup.
      
      Remove all asserts + simple format.
      
      * Fixing audio-classification for large PR.
      
      * Overly explicity null checking.
      
      * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.
      
      * Removed internal state for parameters of the  pipeline.
      
      Instead of overriding implicitly internal state, we moved
      to real named arguments on every `preprocess`, `_forward`,
      `postprocess` function.
      
      Instead `_sanitize_parameters` will be used to split all kwargs
      of both __init__ and __call__ into the 3 kinds of named parameters.
      
      * Move import warnings.
      
      * Small fixes.
      
      * Quality.
      
      * Another small fix, using the CI to debug faster.
      
      * Last fixes.
      
      * Last fix.
      
      * Small cleanup of tensor moving.
      
      * is not None.
      
      * Adding a bunch of docs + a iteration test.
      
      * Fixing doc style.
      
      * KeyDataset = None guard.
      
      * RRemoving the Cuda test for pipelines (was testing).
      
      * Even more simple iteration test.
      
      * Correct import .
      
      * Long day.
      
      * Fixes in docs.
      
      * [WIP] migrating object detection.
      
      * Fixed the target_size bug.
      
      * Fixup.
      
      * Bad variable name.
      
      * Fixing `ensure_on_device` respects original ModelOutput.
      c63fcabf
  28. 27 Aug, 2021 2 commits
  29. 26 Aug, 2021 2 commits
  30. 13 Aug, 2021 1 commit
    • Nicolas Patry's avatar
      Moving fill-mask pipeline to new testing scheme (#12943) · d58926ab
      Nicolas Patry authored
      * Fill mask pipelines test updates.
      
      * Model eval !!
      
      * Adding slow test with actual values.
      
      * Making all tests pass (skipping quite a bit.)
      
      * Doc styling.
      
      * Better doc cleanup.
      
      * Making an explicit test with no pad token tokenizer.
      
      * Typo.
      d58926ab
  31. 29 Jul, 2021 1 commit
  32. 22 Jul, 2021 1 commit
    • Nicolas Patry's avatar
      Improving pipeline tests (#12784) · 795c1444
      Nicolas Patry authored
      
      
      * Proposal
      
      * Testing pipelines slightly better.
      
      - Overall same design
      - Metaclass to get proper different tests instead of subTest (not well
      supported by Pytest)
      - Added ANY meta object to make output checking more readable.
      - Skipping architectures either without tiny_config or without
      architecture.
      
      * Small fix.
      
      * Fixing the tests in case of None value.
      
      * Oups.
      
      * Rebased with more architectures.
      
      * Fixing reformer tests (no override anymore).
      
      * Adding more options for model tester config.
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      795c1444
  33. 18 May, 2021 1 commit
  34. 25 Feb, 2021 1 commit
    • Patrick von Platen's avatar
      [PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor,... · cb38ffcc
      Patrick von Platen authored
      [PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324)
      
      * push to show
      
      * small improvement
      
      * small improvement
      
      * Update src/transformers/feature_extraction_utils.py
      
      * Update src/transformers/feature_extraction_utils.py
      
      * implement base
      
      * add common tests
      
      * make all tests pass for wav2vec2
      
      * make padding work & add more tests
      
      * finalize feature extractor utils
      
      * add call method to feature extraction
      
      * finalize feature processor
      
      * finish tokenizer
      
      * finish general processor design
      
      * finish tests
      
      * typo
      
      * remove bogus file
      
      * finish docstring
      
      * add docs
      
      * finish docs
      
      * small fix
      
      * correct docs
      
      * save intermediate
      
      * load changes
      
      * apply changes
      
      * apply changes to doc
      
      * change tests
      
      * apply surajs recommend
      
      * final changes
      
      * Apply suggestions from code review
      
      * fix typo
      
      * fix import
      
      * correct docstring
      cb38ffcc