1. 25 Jan, 2023 1 commit
  2. 18 Jan, 2023 1 commit
  3. 16 Jan, 2023 1 commit
    • Nicolas Patry's avatar
      Fixing batching pipelines on single items for ChunkPipeline (#21132) · 488a179c
      Nicolas Patry authored
      * Fixing #20783
      
      * Update src/transformers/pipelines/base.py
      
      * Fixing some tests.
      
      * Fixup.
      
      * Remove ffmpeg dep + a bit more relaxed for bigbird QA precision.
      
      * Better dataset.
      
      * Prevent failing on TF.
      
      * Better condition. We can't use `can_use_iterator` since we cannot use it
      directly.
      488a179c
  4. 19 Dec, 2022 1 commit
  5. 21 Nov, 2022 1 commit
    • NielsRogge's avatar
      Add Audio Spectogram Transformer (#19981) · 4973d2a0
      NielsRogge authored
      
      
      * First draft
      
      * Make conversion script work
      
      * Add id2label mapping, run code quality
      
      * Fix copies
      
      * Add first draft of feature extractor
      
      * Update conversion script to use feature extractor
      
      * Make more tests pass
      
      * Add docs
      
      * update input_features to input_values + pad by default to max length
      
      * Fix doc tests
      
      * Add feature extractor tests
      
      * Add proper padding/truncation to feature extractor
      
      * Add support for conversion of all audioset checkpoints
      
      * Improve docs and extend conversion script
      
      * Fix README
      
      * Rename spectogram to spectrogram
      
      * Fix copies
      
      * Add integration test
      
      * Remove dummy conv
      
      * Update to ast
      
      * Update organization
      
      * Fix init
      
      * Rename model to AST
      
      * Add require_torchaudio annotator
      
      * Move import of ASTFeatureExtractor under a is_speech_available
      
      * Fix rebase
      
      * Add pipeline config
      
      * Update name of classifier head
      
      * Rename time_dimension and frequency_dimension for clarity
      
      * Remove print statement
      
      * Fix pipeline test
      
      * Fix pipeline test
      
      * Fix index table
      
      * Fix init
      
      * Fix conversion script
      
      * Rename to ForAudioClassification
      
      * Fix index table
      Co-authored-by: default avatarNiels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
      4973d2a0
  6. 14 Nov, 2022 1 commit
  7. 03 Nov, 2022 1 commit
  8. 17 Oct, 2022 1 commit
    • Sivaudha's avatar
      Fix pipeline predict transform methods (#19657) · 8aad4363
      Sivaudha authored
      * Remove key word argument X from pipeline predict and transform methods
      
      As __call__ of pipeline clasees require one positional argument, passing
      the input as a keyword argument inside predict, transform methods, causing
      __call__ to fail. Hence in this commit the keyword argument is modified
      into positional argument.
      
      * Implement basic tests for scikitcompat pipeline interface
      
      * Seperate tests instead of running with parameterized based on framework as both frameworks will not be active at the same time
      8aad4363
  9. 11 Oct, 2022 1 commit
    • Arthur's avatar
      Fix whisper for `pipeline` (#19482) · b722a6be
      Arthur authored
      * update feature extractor params
      
      * update attention mask handling
      
      * fix doc and pipeline test
      
      * add warning when skipping test
      
      * add whisper translation and transcription test
      
      * fix build doc test
      b722a6be
  10. 07 Oct, 2022 1 commit
    • Sylvain Gugger's avatar
      Rework pipeline tests (#19366) · 9ac586b3
      Sylvain Gugger authored
      * Rework pipeline tests
      
      * Try to fix Flax tests
      
      * Try to put it before
      
      * Use a new decorator instead
      
      * Remove ignore marker since it doesn't work
      
      * Filter pipeline tests
      
      * Woopsie
      
      * Use the fitlered list
      
      * Clean up and fake modif
      
      * Remove init
      
      * Revert fake modif
      9ac586b3
  11. 05 Oct, 2022 1 commit
  12. 06 Sep, 2022 1 commit
  13. 10 Aug, 2022 1 commit
  14. 05 Aug, 2022 1 commit
  15. 19 Jul, 2022 1 commit
  16. 01 Jul, 2022 1 commit
  17. 30 Jun, 2022 2 commits
  18. 12 May, 2022 1 commit
  19. 05 May, 2022 1 commit
  20. 04 Mar, 2022 1 commit
  21. 23 Feb, 2022 2 commits
    • Lysandre Debut's avatar
      [Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41
      Lysandre Debut authored
      
      
      * Per-folder tests reorganization
      Co-authored-by: default avatarsgugger <sylvain.gugger@gmail.com>
      Co-authored-by: default avatarStas Bekman <stas@stason.org>
      29c10a41
    • Nicolas Patry's avatar
      Enable `image-segmentation` on `AutoModelForSemanticSegmentation` (#15647) · 9e71d464
      Nicolas Patry authored
      * Enabling Beit SegFormer to `image-segmentation`.
      
      * Fixing the score.
      
      * Fix import ?
      
      * Missing in type hint.
      
      * Multiple test fixes:
      
      - Add `raw_image` support. It should be the default IMHO since in Python
        world it doesn't make any sense to base64 encode the image (Sorry
        @mishig, didn't catch that in my review). I really think we should
        consider breaking BC here.
      - Add support for Segformer tiny test (needed
        `SegformerModelTester.get_config` to enable TinyConfig
        @NielsRogge)
      - Add the check that `batch_size` works correctly on that pipeline.
        Uncovered that it doesn't for Detr, which IMO is OK since images
        after `feature_extractor` don't have the same size. Comment should
        explain.
      
      * Type hint as a string.
      
      * Make fixup + update black.
      
      * torch+vision protections.
      
      * Don't use torchvision, use F.interpolate instead (no new dep).
      
      * Last fixes for Segformer.
      
      * Update test to reflect new image (which was broken)
      
      * Update tests.
      
      * Major BC modification:
      
      - Removed the string compressed PNG string, that's a job for users
      `transformers` stays in python land.
      - Removed the `score` for semantic segmentation. It has hardly a meaning
        on its own in this context.
      - Don't include the grayscale with logits for now (which could enable
        users to get a sense of confidence). Might be done later.
      - Don't include the surface of the mask (could be used for sorting by
        users, to filter out small masks). It's already calculable, and
        it's easier to add later, than to add now and break later if we need.
      
      * `make fixup`.
      
      * Small changes.
      
      * Rebase + doc fixup.
      9e71d464
  22. 05 Jan, 2022 1 commit
  23. 04 Jan, 2022 1 commit
    • Nicolas Patry's avatar
      Hotfix `chunk_length_s` instead of `_ms`. (#15029) · 19d37c2d
      Nicolas Patry authored
      * Hotfix `chunk_length_s` instead of `_ms`.
      
      * Adding fix of `pad_token` which should be last/previous token for CTC
      
      proper decoding
      
      * Fixing ChunkPipeline unwrapping.
      
      * Adding a PackIterator specific test.
      19d37c2d
  24. 27 Dec, 2021 1 commit
    • Nicolas Patry's avatar
      ChunkPipeline (batch_size enabled on `zero-cls` and `qa` pipelines. (#14225) · b058490c
      Nicolas Patry authored
      
      
      * Pipeline chunks.
      
      * Batching for Chunking pipelines ?
      
      * Batching for `question-answering` and `zero-shot-cls`.
      
      * Fixing for FNet.
      
      * Making ASR a chunk pipeline.
      
      * Chunking ASR API.
      
      * doc style.
      
      * Fixing ASR test.
      
      * Fixing QA eror (p_mask, padding is 1, not 0).
      
      * Enable both vad and simple chunking.
      
      * Max length for vad.
      
      * remove inference mode, crashing on s2t.
      
      * Revert ChunkPipeline for ASRpipeline.
      
      Too many knobs for simple integration within the pipeline, better stick
      to external convenience functions instead, more control to be had,
      simpler pipeline and also easier to replace with other things later.
      
      * Drop necessity for PT for these.
      
      * Enabling generators.
      
      * Add mic + cleanup.
      
      * Typo.
      
      * Typo2.
      
      * Remove ASR work, it does not belong in this PR anymore.
      
      * Update src/transformers/pipelines/pt_utils.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/pipelines/zero_shot_classification.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Adding many comments.
      
      * Doc quality.
      
      * `hidden_states` handling.
      
      * Adding doc.
      
      * Bad rebase.
      
      * Autofixing docs.
      
      * Fixing CRITICAL bug in the new Zerocls pipeline.
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      b058490c
  25. 14 Dec, 2021 1 commit
    • Nicolas Patry's avatar
      Fixing tests for Perceiver (#14739) · 546a91ab
      Nicolas Patry authored
      * Adding some slow test to check for perceiver at least from a high level.
      
      * Re-enabling fast tests for Perceiver ImageClassification.
      
      * Perceiver might try to run without Tokenizer (Fast doesn't exist) and
      with FeatureExtractor some text only pipelines.
      
      * Oops.
      
      * Adding a comment for `update_config_with_model_class`.
      
      * Remove `model_architecture` to get `tiny_config`.
      
      * Finalize rebase.
      
      * Smarter way to handle undefined FastTokenizer.
      
      * Remove old code.
      
      * Addressing some nits.
      
      * Don't instantiate `None`.
      546a91ab
  26. 13 Dec, 2021 1 commit
    • Lysandre Debut's avatar
      Fixing tests for Perceiver (#14745) · 3d66146a
      Lysandre Debut authored
      
      
      - Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for
      langage, which cannot load a FeatureExtractor so current logic fails).
      - Add a safeguard to not run tests when `tokenizer_class` or
      `feature_extractor_class` **are** defined, but cannot be loaded
      This happens for Perceiver for the "FastTokenizer" (which doesn't exist
      so None) and FeatureExtractor (which does exist but cannot be loaded
      because the checkpoint doesn't define one which is reasonable for the
      said checkpoint)
      - Added `get_vocab` function to `PerceiverTokenizer` since it is used by
      `fill-mask` pipeline when the argument `targets` is used to narrow a
      subset of possible values.
      Co-authored-by: default avatarNicolas Patry <patry.nicolas@protonmail.com>
      3d66146a
  27. 08 Dec, 2021 1 commit
  28. 22 Nov, 2021 1 commit
  29. 19 Nov, 2021 1 commit
  30. 12 Nov, 2021 1 commit
    • Nicolas Patry's avatar
      Adding support for raw python `generator` in addition to `Dataset` for pipelines (#14352) · ed5d1551
      Nicolas Patry authored
      * Adding support for raw python `generator` in addition to `Dataset`
      
      The main goal is to ease the create of streaming data to the pipe.
      
      `Dataset` is more involved and pytorch specific.
      
      This PR, provides a way to use a python iterator too.
      This enabled #14250 but can be proposed as a standalone PR.
      
      ```python
      from transformers import pipeline
      
      def read_data(filename):
          with open(filename, 'r') as f:
              for line in f:
                  yield f
      
      pipe = pipeline("text-classification")
      for classified in pipe(read_data("large_file.txt")):
          print("Success ! ", classified)
      ```
      
      The main caveat of this, is the interaction with `DataLoader` with
      `num_workers>1`. When you have multiple workers, each receive a copy
      of the generator (like `IterableDataset`). That means the naive Iterator
      will fail since all workers iterate on all items of the generator.
      
      There are ways to do clever "skipping", but it could be bad still
      because all workers still do have to pass through all items of the
      generator (they just ignore items they don't handle), depending on
      the case it might be bad.
      
      Using `num_workers=1` is the simplest fix and if the cost of loading
      your data is small enough should be good enough. In the above example
      trying to do smart tricks to skip some lines is unlikely to be a net
      positive for instance.
      
      If there are better ways to do "jumps" on some data, then using
      `Dataset` is more advised (since then differents workers can just jump
      themselves).
      
      * Adding iterator support for `tf` too.
      ed5d1551
  31. 10 Nov, 2021 1 commit
  32. 03 Nov, 2021 1 commit
  33. 29 Oct, 2021 3 commits
  34. 14 Oct, 2021 1 commit
  35. 10 Sep, 2021 1 commit
    • Nicolas Patry's avatar
      [Large PR] Entire rework of pipelines. (#13308) · c63fcabf
      Nicolas Patry authored
      
      
      * Enabling dataset iteration on pipelines.
      
      Enabling dataset iteration on pipelines.
      
      Unifying parameters under `set_parameters` function.
      
      Small fix.
      
      Last fixes after rebase
      
      Remove print.
      
      Fixing text2text `generate_kwargs`
      
      No more `self.max_length`.
      
      Fixing tf only conversational.
      
      Consistency in start/stop index over TF/PT.
      
      Speeding up drastically on TF (nasty bug where max_length would increase
      a ton.)
      
      Adding test for support for non fast tokenizers.
      
      Fixign GPU usage on zero-shot.
      
      Fix working on Tf.
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Small cleanup.
      
      Remove all asserts + simple format.
      
      * Fixing audio-classification for large PR.
      
      * Overly explicity null checking.
      
      * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.
      
      * Removed internal state for parameters of the  pipeline.
      
      Instead of overriding implicitly internal state, we moved
      to real named arguments on every `preprocess`, `_forward`,
      `postprocess` function.
      
      Instead `_sanitize_parameters` will be used to split all kwargs
      of both __init__ and __call__ into the 3 kinds of named parameters.
      
      * Move import warnings.
      
      * Small fixes.
      
      * Quality.
      
      * Another small fix, using the CI to debug faster.
      
      * Last fixes.
      
      * Last fix.
      
      * Small cleanup of tensor moving.
      
      * is not None.
      
      * Adding a bunch of docs + a iteration test.
      
      * Fixing doc style.
      
      * KeyDataset = None guard.
      
      * RRemoving the Cuda test for pipelines (was testing).
      
      * Even more simple iteration test.
      
      * Correct import .
      
      * Long day.
      
      * Fixes in docs.
      
      * [WIP] migrating object detection.
      
      * Fixed the target_size bug.
      
      * Fixup.
      
      * Bad variable name.
      
      * Fixing `ensure_on_device` respects original ModelOutput.
      c63fcabf
  36. 27 Aug, 2021 1 commit