1. 27 Dec, 2021 2 commits
    • Sylvain Gugger's avatar
      Doc styler v2 (#14950) · 87e6e4fe
      Sylvain Gugger authored
      * New doc styler
      
      * Fix issue with args at the start
      
      * Code sample fixes
      
      * Style code examples in MDX
      
      * Fix more patterns
      
      * Typo
      
      * Typo
      
      * More patterns
      
      * Do without black for now
      
      * Get more info in error
      
      * Docstring style
      
      * Re-enable check
      
      * Quality
      
      * Fix add_end_docstring decorator
      
      * Fix docstring
      87e6e4fe
    • Nicolas Patry's avatar
      ChunkPipeline (batch_size enabled on `zero-cls` and `qa` pipelines. (#14225) · b058490c
      Nicolas Patry authored
      
      
      * Pipeline chunks.
      
      * Batching for Chunking pipelines ?
      
      * Batching for `question-answering` and `zero-shot-cls`.
      
      * Fixing for FNet.
      
      * Making ASR a chunk pipeline.
      
      * Chunking ASR API.
      
      * doc style.
      
      * Fixing ASR test.
      
      * Fixing QA eror (p_mask, padding is 1, not 0).
      
      * Enable both vad and simple chunking.
      
      * Max length for vad.
      
      * remove inference mode, crashing on s2t.
      
      * Revert ChunkPipeline for ASRpipeline.
      
      Too many knobs for simple integration within the pipeline, better stick
      to external convenience functions instead, more control to be had,
      simpler pipeline and also easier to replace with other things later.
      
      * Drop necessity for PT for these.
      
      * Enabling generators.
      
      * Add mic + cleanup.
      
      * Typo.
      
      * Typo2.
      
      * Remove ASR work, it does not belong in this PR anymore.
      
      * Update src/transformers/pipelines/pt_utils.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Update src/transformers/pipelines/zero_shot_classification.py
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      
      * Adding many comments.
      
      * Doc quality.
      
      * `hidden_states` handling.
      
      * Adding doc.
      
      * Bad rebase.
      
      * Autofixing docs.
      
      * Fixing CRITICAL bug in the new Zerocls pipeline.
      Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
      b058490c
  2. 21 Dec, 2021 1 commit
    • Sylvain Gugger's avatar
      Mass conversion of documentation from rst to Markdown (#14866) · 27b3031d
      Sylvain Gugger authored
      * Convert docstrings of all configurations and tokenizers
      
      * Processors and fixes
      
      * Last modeling files and fixes to models
      
      * Pipeline modules
      
      * Utils files
      
      * Data submodule
      
      * All the other files
      
      * Style
      
      * Missing examples
      
      * Style again
      
      * Fix copies
      
      * Say bye bye to rst docstrings forever
      27b3031d
  3. 06 Dec, 2021 1 commit
  4. 19 Nov, 2021 1 commit
  5. 12 Nov, 2021 1 commit
    • Nicolas Patry's avatar
      Adding support for raw python `generator` in addition to `Dataset` for pipelines (#14352) · ed5d1551
      Nicolas Patry authored
      * Adding support for raw python `generator` in addition to `Dataset`
      
      The main goal is to ease the create of streaming data to the pipe.
      
      `Dataset` is more involved and pytorch specific.
      
      This PR, provides a way to use a python iterator too.
      This enabled #14250 but can be proposed as a standalone PR.
      
      ```python
      from transformers import pipeline
      
      def read_data(filename):
          with open(filename, 'r') as f:
              for line in f:
                  yield f
      
      pipe = pipeline("text-classification")
      for classified in pipe(read_data("large_file.txt")):
          print("Success ! ", classified)
      ```
      
      The main caveat of this, is the interaction with `DataLoader` with
      `num_workers>1`. When you have multiple workers, each receive a copy
      of the generator (like `IterableDataset`). That means the naive Iterator
      will fail since all workers iterate on all items of the generator.
      
      There are ways to do clever "skipping", but it could be bad still
      because all workers still do have to pass through all items of the
      generator (they just ignore items they don't handle), depending on
      the case it might be bad.
      
      Using `num_workers=1` is the simplest fix and if the cost of loading
      your data is small enough should be good enough. In the above example
      trying to do smart tricks to skip some lines is unlikely to be a net
      positive for instance.
      
      If there are better ways to do "jumps" on some data, then using
      `Dataset` is more advised (since then differents workers can just jump
      themselves).
      
      * Adding iterator support for `tf` too.
      ed5d1551
  6. 29 Oct, 2021 2 commits
  7. 26 Oct, 2021 1 commit
  8. 06 Oct, 2021 1 commit
  9. 10 Sep, 2021 1 commit
    • Nicolas Patry's avatar
      [Large PR] Entire rework of pipelines. (#13308) · c63fcabf
      Nicolas Patry authored
      
      
      * Enabling dataset iteration on pipelines.
      
      Enabling dataset iteration on pipelines.
      
      Unifying parameters under `set_parameters` function.
      
      Small fix.
      
      Last fixes after rebase
      
      Remove print.
      
      Fixing text2text `generate_kwargs`
      
      No more `self.max_length`.
      
      Fixing tf only conversational.
      
      Consistency in start/stop index over TF/PT.
      
      Speeding up drastically on TF (nasty bug where max_length would increase
      a ton.)
      
      Adding test for support for non fast tokenizers.
      
      Fixign GPU usage on zero-shot.
      
      Fix working on Tf.
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Update src/transformers/pipelines/base.py
      Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
      
      Small cleanup.
      
      Remove all asserts + simple format.
      
      * Fixing audio-classification for large PR.
      
      * Overly explicity null checking.
      
      * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`.
      
      * Removed internal state for parameters of the  pipeline.
      
      Instead of overriding implicitly internal state, we moved
      to real named arguments on every `preprocess`, `_forward`,
      `postprocess` function.
      
      Instead `_sanitize_parameters` will be used to split all kwargs
      of both __init__ and __call__ into the 3 kinds of named parameters.
      
      * Move import warnings.
      
      * Small fixes.
      
      * Quality.
      
      * Another small fix, using the CI to debug faster.
      
      * Last fixes.
      
      * Last fix.
      
      * Small cleanup of tensor moving.
      
      * is not None.
      
      * Adding a bunch of docs + a iteration test.
      
      * Fixing doc style.
      
      * KeyDataset = None guard.
      
      * RRemoving the Cuda test for pipelines (was testing).
      
      * Even more simple iteration test.
      
      * Correct import .
      
      * Long day.
      
      * Fixes in docs.
      
      * [WIP] migrating object detection.
      
      * Fixed the target_size bug.
      
      * Fixup.
      
      * Bad variable name.
      
      * Fixing `ensure_on_device` respects original ModelOutput.
      c63fcabf
  10. 26 Aug, 2021 2 commits
  11. 13 Aug, 2021 1 commit
    • Nicolas Patry's avatar
      Moving fill-mask pipeline to new testing scheme (#12943) · d58926ab
      Nicolas Patry authored
      * Fill mask pipelines test updates.
      
      * Model eval !!
      
      * Adding slow test with actual values.
      
      * Making all tests pass (skipping quite a bit.)
      
      * Doc styling.
      
      * Better doc cleanup.
      
      * Making an explicit test with no pad token tokenizer.
      
      * Typo.
      d58926ab
  12. 29 Jul, 2021 1 commit
  13. 22 Jul, 2021 1 commit
    • Nicolas Patry's avatar
      Improving pipeline tests (#12784) · 795c1444
      Nicolas Patry authored
      
      
      * Proposal
      
      * Testing pipelines slightly better.
      
      - Overall same design
      - Metaclass to get proper different tests instead of subTest (not well
      supported by Pytest)
      - Added ANY meta object to make output checking more readable.
      - Skipping architectures either without tiny_config or without
      architecture.
      
      * Small fix.
      
      * Fixing the tests in case of None value.
      
      * Oups.
      
      * Rebased with more architectures.
      
      * Fixing reformer tests (no override anymore).
      
      * Adding more options for model tester config.
      Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
      795c1444
  14. 07 Jul, 2021 1 commit
    • Nicolas Patry's avatar
      Adding support for `pipeline("automatic-speech-recognition")`. (#11525) · ebc69afc
      Nicolas Patry authored
      * Adding support for `pipeline("automatic-speech-recognition")`.
      
      - Ugly `"config"` choice for AutoModel. It would be great to have the
      possibility to have something like `AutoModelFor` that would implement
      the same logic (Load the config, check Architectures and load the first
      one)
      
      * Remove `model_id` was not needed in the end.
      
      * Rebased !
      
      * Remove old code.
      
      * Rename `nlp`.
      ebc69afc
  15. 07 Jun, 2021 1 commit
  16. 07 May, 2021 1 commit
  17. 16 Apr, 2021 1 commit
  18. 07 Apr, 2021 1 commit
  19. 31 Mar, 2021 2 commits
  20. 30 Mar, 2021 1 commit
  21. 29 Mar, 2021 1 commit
  22. 11 Jan, 2021 1 commit
    • Nicolas Patry's avatar
      Enable TruncationStrategy override for pipelines (#9432) · d20e9c72
      Nicolas Patry authored
      * Enable TruncationStrategy override for pipelines
      
      * Update isort.
      
      * Fixing test
      
      * Fixing text_generation pipeline.
      
      * Using same DummyTok as other PR  for easier merge later.
      
      * Some more import guards.
      
      * Remove bogus file.
      
      * Do not pass `generate_kwargs` to `_parse_and_tokenize`.
      @patrickvonplaten
      
      * Removed DummyTok.
      
      * Doc quality.
      d20e9c72
  23. 06 Jan, 2021 1 commit
    • Nicolas Patry's avatar
      [Refactor] Splitting pipelines.py into its own module. (#9279) · 090d28e3
      Nicolas Patry authored
      * Splitting pipelines into its own module.
      
      * Moving everything into base.py
      
      * Moving FeatureExtractionPipeline into its own file.
      
      * TextGenerationPipeline.
      
      * TextClassifictionPipeline
      
      * ZeroShot + get_framework import.
      
      * FillMaskPipeline
      
      * NerPipeline + TokenClassificationPipeline
      
      * QuestionAnsweringPipeline
      
      * TableQuestionAnsweringPipeline
      
      * ConversationnalPipeline
      
      * Text2TextGenerationPipeline, TranslationPipeline, SummarizationPipeline
      
      * Typo import fix.
      
      * Relative imports.
      090d28e3