1. 05 Aug, 2022 1 commit
    • Nicolas Patry's avatar
      Fixing issue where generic model types wouldn't load properly with the pipeline (#18392) · 586dcf6b
      Nicolas Patry authored
      * Adding a better error message when the model is improperly configured
      
      within transformers.
      
      * Update src/transformers/pipelines/__init__.py
      
      * Black version.
      
      * Overriding task aliases so that tokenizer+feature_extractor
      
      values are correct.
      
      * Fixing task aliases by overriding their names early
      
      * X.
      
      * Fixing feature-extraction.
      
      * black again.
      
      * Normalizing `translation` too.
      
      * Fixing last few corner cases.
      
      translation need to use its non normalized name (translation_XX_to_YY,
      so that the task_specific_params are correctly overloaded).
      This can be removed and cleaned up in a later PR.
      
      `speech-encode-decoder` actually REQUIRES to pass a `tokenizer` manually
      so the error needs to be discarded when the `tokenizer` is already
      there.
      
      * doc-builder fix.
      
      * Fixing the real issue.
      
      * Removing dead code.
      
      * Do not import the actual config classes.
      586dcf6b
  2. 03 Aug, 2022 1 commit
    • LSinev's avatar
      Fix torch version comparisons (#18460) · 02b176c4
      LSinev authored
      Comparisons like
      version.parse(torch.__version__) > version.parse("1.6")
      are True for torch==1.6.0+cu101 or torch==1.6.0+cpu
      
      version.parse(version.parse(torch.__version__).base_version) are preferred (and available in pytorch_utils.py
      02b176c4
  3. 19 Jul, 2022 1 commit
  4. 30 Jun, 2022 2 commits
  5. 09 Jun, 2022 1 commit
  6. 24 May, 2022 1 commit
  7. 18 May, 2022 1 commit
  8. 12 May, 2022 1 commit
  9. 23 Mar, 2022 1 commit
    • Sylvain Gugger's avatar
      Reorganize file utils (#16264) · 4975002d
      Sylvain Gugger authored
      * Split file_utils in several submodules
      
      * Fixes
      
      * Add back more objects
      
      * More fixes
      
      * Who exactly decided to import that from there?
      
      * Second suggestion to code with code review
      
      * Revert wront move
      
      * Fix imports
      
      * Adapt all imports
      
      * Adapt all imports everywhere
      
      * Revert this import, will fix in a separate commit
      4975002d
  10. 18 Mar, 2022 1 commit
  11. 23 Feb, 2022 1 commit
    • Nicolas Patry's avatar
      Adding ZeroShotImageClassificationPipeline (#12119) · f9582c20
      Nicolas Patry authored
      
      
      * [Proposal] Adding ZeroShotImageClassificationPipeline
      
      - Based on CLIP
      
      * WIP, Resurection in progress.
      
      * Resurrection... achieved.
      
      * Reword handling different `padding_value` for `feature_extractor` and
      `tokenizer`.
      
      * Thanks doc-builder !
      
      * Adding docs + global namespace `ZeroShotImageClassificationPipeline`.
      
      * Fixing templates.
      
      * Make the test pass and be robust to floating error.
      
      * Adressing suraj's comments on docs mostly.
      
      * Tf support start.
      
      * TF support.
      
      * Update src/transformers/pipelines/zero_shot_image_classification.py
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      Co-authored-by: default avatarSuraj Patil <surajp815@gmail.com>
      f9582c20
  12. 15 Feb, 2022 1 commit
  13. 05 Jan, 2022 1 commit
  14. 27 Dec, 2021 2 commits
    • Sylvain Gugger's avatar
      Doc styler v2 (#14950) · 87e6e4fe
      Sylvain Gugger authored
      * New doc styler
      
      * Fix issue with args at the start
      
      * Code sample fixes
      
      * Style code examples in MDX
      
      * Fix more patterns
      
      * Typo
      
      * Typo
      
      * More patterns
      
      * Do without black for now
      
      * Get more info in error
      
      * Docstring style
      
      * Re-enable check
      
      * Quality
      
      * Fix add_end_docstring decorator
      
      * Fix docstring
      87e6e4fe
    • Nicolas Patry's avatar
      ChunkPipeline (batch_size enabled on `zero-cls` and `qa` pipelines. (#14225) · b058490c
      Nicolas Patry authored
      
      
      * Pipeline chunks.
      
      * Batching for Chunking pipelines ?
      
      * Batching for `question-answering` and `zero-shot-cls`.
      
      * Fixing for FNet.
      
      * Making ASR a chunk pipeline.
      
      * Chunking ASR API.
      
      * doc style.
      
      * Fixing ASR test.
      
      * Fixing QA eror (p_mask, padding is 1, not 0).
      
      * Enable both vad and simple chunking.
      
      * Max length for vad.
      
      * remove inference mode, crashing on s2t.
      
      * Revert ChunkPipeline for ASRpipeline.
      
      Too many knobs for simple integration within the pipeline, better stick
      to external convenience functions instead, more control to be had,
      simpler pipeline and also easier to replace with other things later.
      
      * Drop necessity for PT for these.
      
      * Enabling generators.
      
      * Add mic + cleanup.
      
      * Typo.
      
      * Typo2.
      
      * Remove ASR work, it does not belong in this PR anymore.
      
      * Update src/transformers/pipelines/pt_utils.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/pipelines/zero_shot_classification.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Adding many comments.
      
      * Doc quality.
      
      * `hidden_states` handling.
      
      * Adding doc.
      
      * Bad rebase.
      
      * Autofixing docs.
      
      * Fixing CRITICAL bug in the new Zerocls pipeline.
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      b058490c
  15. 21 Dec, 2021 1 commit
    • Sylvain Gugger's avatar
      Mass conversion of documentation from rst to Markdown (#14866) · 27b3031d
      Sylvain Gugger authored
      * Convert docstrings of all configurations and tokenizers
      
      * Processors and fixes
      
      * Last modeling files and fixes to models
      
      * Pipeline modules
      
      * Utils files
      
      * Data submodule
      
      * All the other files
      
      * Style
      
      * Missing examples
      
      * Style again
      
      * Fix copies
      
      * Say bye bye to rst docstrings forever
      27b3031d
  16. 06 Dec, 2021 1 commit
  17. 19 Nov, 2021 1 commit
  18. 12 Nov, 2021 1 commit
    • Nicolas Patry's avatar
      Adding support for raw python `generator` in addition to `Dataset` for pipelines (#14352) · ed5d1551
      Nicolas Patry authored
      * Adding support for raw python `generator` in addition to `Dataset`
      
      The main goal is to ease the create of streaming data to the pipe.
      
      `Dataset` is more involved and pytorch specific.
      
      This PR, provides a way to use a python iterator too.
      This enabled #14250 but can be proposed as a standalone PR.
      
      ```python
      from transformers import pipeline
      
      def read_data(filename):
          with open(filename, 'r') as f:
              for line in f:
                  yield f
      
      pipe = pipeline("text-classification")
      for classified in pipe(read_data("large_file.txt")):
          print("Success ! ", classified)
      ```
      
      The main caveat of this, is the interaction with `DataLoader` with
      `num_workers>1`. When you have multiple workers, each receive a copy
      of the generator (like `IterableDataset`). That means the naive Iterator
      will fail since all workers iterate on all items of the generator.
      
      There are ways to do clever "skipping", but it could be bad still
      because all workers still do have to pass through all items of the
      generator (they just ignore items they don't handle), depending on
      the case it might be bad.
      
      Using `num_workers=1` is the simplest fix and if the cost of loading
      your data is small enough should be good enough. In the above example
      trying to do smart tricks to skip some lines is unlikely to be a net
      positive for instance.
      
      If there are better ways to do "jumps" on some data, then using
      `Dataset` is more advised (since then differents workers can just jump
      themselves).
      
      * Adding iterator support for `tf` too.
      ed5d1551
  19. 29 Oct, 2021 2 commits
  20. 26 Oct, 2021 1 commit
  21. 06 Oct, 2021 1 commit
  22. 10 Sep, 2021 1 commit
    • Nicolas Patry's avatar
      [Large PR] Entire rework of pipelines. (#13308) · c63fcabf
      Nicolas Patry authored
      
      
      * Enabling dataset iteration on pipelines.
      
      Enabling dataset iteration on pipelines.
      
      Unifying parameters under `set_parameters` function.
      
      Small fix.
      
      Last fixes after rebase
      
      Remove print.
      
      Fixing text2text `generate_kwargs`
      
      No more `self.max_length`.
      
      Fixing tf only conversational.
      
      Consistency in start/stop index over TF/PT.
      
      Speeding up drastically on TF (nasty bug where max_length would increase
      a ton.)
      
      Adding test for support for non fast tokenizers.
      
      Fixign GPU usage on zero-shot.
      
      Fix working on Tf.
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Small cleanup.
      
      Remove all asserts + simple format.
      
      * Fixing audio-classification for large PR.
      
      * Overly explicity null checking.
      
      * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.
      
      * Removed internal state for parameters of the  pipeline.
      
      Instead of overriding implicitly internal state, we moved
      to real named arguments on every `preprocess`, `_forward`,
      `postprocess` function.
      
      Instead `_sanitize_parameters` will be used to split all kwargs
      of both __init__ and __call__ into the 3 kinds of named parameters.
      
      * Move import warnings.
      
      * Small fixes.
      
      * Quality.
      
      * Another small fix, using the CI to debug faster.
      
      * Last fixes.
      
      * Last fix.
      
      * Small cleanup of tensor moving.
      
      * is not None.
      
      * Adding a bunch of docs + a iteration test.
      
      * Fixing doc style.
      
      * KeyDataset = None guard.
      
      * RRemoving the Cuda test for pipelines (was testing).
      
      * Even more simple iteration test.
      
      * Correct import .
      
      * Long day.
      
      * Fixes in docs.
      
      * [WIP] migrating object detection.
      
      * Fixed the target_size bug.
      
      * Fixup.
      
      * Bad variable name.
      
      * Fixing `ensure_on_device` respects original ModelOutput.
      c63fcabf
  23. 26 Aug, 2021 2 commits
  24. 13 Aug, 2021 1 commit
    • Nicolas Patry's avatar
      Moving fill-mask pipeline to new testing scheme (#12943) · d58926ab
      Nicolas Patry authored
      * Fill mask pipelines test updates.
      
      * Model eval !!
      
      * Adding slow test with actual values.
      
      * Making all tests pass (skipping quite a bit.)
      
      * Doc styling.
      
      * Better doc cleanup.
      
      * Making an explicit test with no pad token tokenizer.
      
      * Typo.
      d58926ab
  25. 29 Jul, 2021 1 commit
  26. 22 Jul, 2021 1 commit
    • Nicolas Patry's avatar
      Improving pipeline tests (#12784) · 795c1444
      Nicolas Patry authored
      
      
      * Proposal
      
      * Testing pipelines slightly better.
      
      - Overall same design
      - Metaclass to get proper different tests instead of subTest (not well
      supported by Pytest)
      - Added ANY meta object to make output checking more readable.
      - Skipping architectures either without tiny_config or without
      architecture.
      
      * Small fix.
      
      * Fixing the tests in case of None value.
      
      * Oups.
      
      * Rebased with more architectures.
      
      * Fixing reformer tests (no override anymore).
      
      * Adding more options for model tester config.
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      795c1444
  27. 07 Jul, 2021 1 commit
    • Nicolas Patry's avatar
      Adding support for `pipeline("automatic-speech-recognition")`. (#11525) · ebc69afc
      Nicolas Patry authored
      * Adding support for `pipeline("automatic-speech-recognition")`.
      
      - Ugly `"config"` choice for AutoModel. It would be great to have the
      possibility to have something like `AutoModelFor` that would implement
      the same logic (Load the config, check Architectures and load the first
      one)
      
      * Remove `model_id` was not needed in the end.
      
      * Rebased !
      
      * Remove old code.
      
      * Rename `nlp`.
      ebc69afc
  28. 07 Jun, 2021 1 commit
  29. 07 May, 2021 1 commit
  30. 16 Apr, 2021 1 commit
  31. 07 Apr, 2021 1 commit
  32. 31 Mar, 2021 2 commits
  33. 30 Mar, 2021 1 commit
  34. 29 Mar, 2021 1 commit
  35. 11 Jan, 2021 1 commit
    • Nicolas Patry's avatar
      Enable TruncationStrategy override for pipelines (#9432) · d20e9c72
      Nicolas Patry authored
      * Enable TruncationStrategy override for pipelines
      
      * Update isort.
      
      * Fixing test
      
      * Fixing text_generation pipeline.
      
      * Using same DummyTok as other PR  for easier merge later.
      
      * Some more import guards.
      
      * Remove bogus file.
      
      * Do not pass `generate_kwargs` to `_parse_and_tokenize`.
      @patrickvonplaten
      
      * Removed DummyTok.
      
      * Doc quality.
      d20e9c72