- 06 Sep, 2022 1 commit
-
-
Sylvain Gugger authored
* Further reduce the number of alls to head for cached models/tokenizers/pipelines * Fix tests * Address review comments
-
- 10 Aug, 2022 1 commit
-
-
Sylvain Gugger authored
* Use commit hash to look in cache instead of calling head * Add tests * Add attr for local configs too * Stupid typos * Fix tests * Update src/transformers/utils/hub.py Co-authored-by:
Julien Chaumond <julien@huggingface.co> * Address Julien's comments Co-authored-by:
Julien Chaumond <julien@huggingface.co>
-
- 05 Aug, 2022 1 commit
-
-
Sylvain Gugger authored
* Fix pipeline tests * Make sure all pipelines tests run with init changes
-
- 19 Jul, 2022 1 commit
-
-
Sylvain Gugger authored
* Initial work * More work * Add tests for custom pipelines on the Hub * Protect import * Make the test work for TF as well * Last PyTorch specific bit * Add documentation * Style * Title in toc * Bad names! * Update docs/source/en/add_new_pipeline.mdx Co-authored-by:
Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline" * Address review comments * Address more review comments * Update src/transformers/pipelines/__init__.py Co-authored-by:
Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by:
Lysandre Debut <lysandre.debut@reseau.eseo.fr>
-
- 01 Jul, 2022 1 commit
-
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 30 Jun, 2022 2 commits
-
-
Aaron Pham authored
* feat: add pipeline registry abstraction - added `PipelineRegistry` abstraction - updates `add_new_pipeline.mdx` (english docs) to reflect the api addition - migrate `check_task` and `get_supported_tasks` from transformers/pipelines/__init__.py to transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks} Signed-off-by:Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: update with upstream/main chore: Apply suggestions from sgugger's code review Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * chore: PR updates - revert src/transformers/dependency_versions_table.py from upstream/main - updates pipeline registry to use global variables Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * tests: add tests for pipeline registry Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * tests: add test for output warning. Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: fmt and cleanup unused imports Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: change imports to top of the file and address comments Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Patrick von Platen authored
* trigger test failure * upload revision poc * Update src/transformers/pipelines/base.py Co-authored-by:
Julien Chaumond <julien@huggingface.co> * up * add test * correct some stuff * Update src/transformers/pipelines/__init__.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct require flag Co-authored-by:
Julien Chaumond <julien@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 12 May, 2022 1 commit
-
-
Sylvain Gugger authored
* Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black
-
- 05 May, 2022 1 commit
-
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 04 Mar, 2022 1 commit
-
-
Nicolas Patry authored
-
- 23 Feb, 2022 2 commits
-
-
Lysandre Debut authored
* Per-folder tests reorganization Co-authored-by:
sgugger <sylvain.gugger@gmail.com> Co-authored-by:
Stas Bekman <stas@stason.org>
-
Nicolas Patry authored
* Enabling Beit SegFormer to `image-segmentation`. * Fixing the score. * Fix import ? * Missing in type hint. * Multiple test fixes: - Add `raw_image` support. It should be the default IMHO since in Python world it doesn't make any sense to base64 encode the image (Sorry @mishig, didn't catch that in my review). I really think we should consider breaking BC here. - Add support for Segformer tiny test (needed `SegformerModelTester.get_config` to enable TinyConfig @NielsRogge) - Add the check that `batch_size` works correctly on that pipeline. Uncovered that it doesn't for Detr, which IMO is OK since images after `feature_extractor` don't have the same size. Comment should explain. * Type hint as a string. * Make fixup + update black. * torch+vision protections. * Don't use torchvision, use F.interpolate instead (no new dep). * Last fixes for Segformer. * Update test to reflect new image (which was broken) * Update tests. * Major BC modification: - Removed the string compressed PNG string, that's a job for users `transformers` stays in python land. - Removed the `score` for semantic segmentation. It has hardly a meaning on its own in this context. - Don't include the grayscale with logits for now (which could enable users to get a sense of confidence). Might be done later. - Don't include the surface of the mask (could be used for sorting by users, to filter out small masks). It's already calculable, and it's easier to add later, than to add now and break later if we need. * `make fixup`. * Small changes. * Rebase + doc fixup.
-
- 05 Jan, 2022 1 commit
-
-
Nicolas Patry authored
* Adding QoL for `batch_size` arg (like others enabled everywhere). * Typo.
-
- 04 Jan, 2022 1 commit
-
-
Nicolas Patry authored
* Hotfix `chunk_length_s` instead of `_ms`. * Adding fix of `pad_token` which should be last/previous token for CTC proper decoding * Fixing ChunkPipeline unwrapping. * Adding a PackIterator specific test.
-
- 27 Dec, 2021 1 commit
-
-
Nicolas Patry authored
* Pipeline chunks. * Batching for Chunking pipelines ? * Batching for `question-answering` and `zero-shot-cls`. * Fixing for FNet. * Making ASR a chunk pipeline. * Chunking ASR API. * doc style. * Fixing ASR test. * Fixing QA eror (p_mask, padding is 1, not 0). * Enable both vad and simple chunking. * Max length for vad. * remove inference mode, crashing on s2t. * Revert ChunkPipeline for ASRpipeline. Too many knobs for simple integration within the pipeline, better stick to external convenience functions instead, more control to be had, simpler pipeline and also easier to replace with other things later. * Drop necessity for PT for these. * Enabling generators. * Add mic + cleanup. * Typo. * Typo2. * Remove ASR work, it does not belong in this PR anymore. * Update src/transformers/pipelines/pt_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/pipelines/zero_shot_classification.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Adding many comments. * Doc quality. * `hidden_states` handling. * Adding doc. * Bad rebase. * Autofixing docs. * Fixing CRITICAL bug in the new Zerocls pipeline. Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 14 Dec, 2021 1 commit
-
-
Nicolas Patry authored
* Adding some slow test to check for perceiver at least from a high level. * Re-enabling fast tests for Perceiver ImageClassification. * Perceiver might try to run without Tokenizer (Fast doesn't exist) and with FeatureExtractor some text only pipelines. * Oops. * Adding a comment for `update_config_with_model_class`. * Remove `model_architecture` to get `tiny_config`. * Finalize rebase. * Smarter way to handle undefined FastTokenizer. * Remove old code. * Addressing some nits. * Don't instantiate `None`.
-
- 13 Dec, 2021 1 commit
-
-
Lysandre Debut authored
- Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for langage, which cannot load a FeatureExtractor so current logic fails). - Add a safeguard to not run tests when `tokenizer_class` or `feature_extractor_class` **are** defined, but cannot be loaded This happens for Perceiver for the "FastTokenizer" (which doesn't exist so None) and FeatureExtractor (which does exist but cannot be loaded because the checkpoint doesn't define one which is reasonable for the said checkpoint) - Added `get_vocab` function to `PerceiverTokenizer` since it is used by `fill-mask` pipeline when the argument `targets` is used to narrow a subset of possible values. Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
- 08 Dec, 2021 1 commit
-
-
Nicolas Patry authored
* Fixing Dataset for TQA + token-classification. * Fixing the tests. * Making sure `offset_mappings` is a valid argument.
-
- 22 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Moving everything to `hf-internal-testing`. * Fixing test values. * Moving to other repo. * Last touch?
-
- 19 Nov, 2021 1 commit
-
-
Nicolas Patry authored
support.
-
- 12 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Adding support for raw python `generator` in addition to `Dataset` The main goal is to ease the create of streaming data to the pipe. `Dataset` is more involved and pytorch specific. This PR, provides a way to use a python iterator too. This enabled #14250 but can be proposed as a standalone PR. ```python from transformers import pipeline def read_data(filename): with open(filename, 'r') as f: for line in f: yield f pipe = pipeline("text-classification") for classified in pipe(read_data("large_file.txt")): print("Success ! ", classified) ``` The main caveat of this, is the interaction with `DataLoader` with `num_workers>1`. When you have multiple workers, each receive a copy of the generator (like `IterableDataset`). That means the naive Iterator will fail since all workers iterate on all items of the generator. There are ways to do clever "skipping", but it could be bad still because all workers still do have to pass through all items of the generator (they just ignore items they don't handle), depending on the case it might be bad. Using `num_workers=1` is the simplest fix and if the cost of loading your data is small enough should be good enough. In the above example trying to do smart tricks to skip some lines is unlikely to be a net positive for instance. If there are better ways to do "jumps" on some data, then using `Dataset` is more advised (since then differents workers can just jump themselves). * Adding iterator support for `tf` too.
-
- 10 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Adding some quality of life for `pipeline` function. * Update docs/source/main_classes/pipelines.rst Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/__init__.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Improve the tests. Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 03 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Adding support for `truncation` parameter on `feature-extraction` pipeline. Fixes #14183 * Fixing tests on ibert, longformer, and roberta. * Rebase fix.
-
- 29 Oct, 2021 3 commits
-
-
Nicolas Patry authored
* Adding `handle_long_generation` paramters for `text-generation` pipeline. * More error handling * Fixing tests by dropping tf support on this functionality, it needs `max_new_tokens` to make it possible to understand user's intent. Otherwise, `max_length` == `tokenizer.model_max_length` < input_ids.shape[0]. * Fixing doc ? * Doc ? * Remove link from doc. * Catched an issue on roberta. * Damn doc. * Non BC proposal ? * Cleaning the fix ? * Finally using only a test override. * Don't need to modify this. * Bad print.
-
Daniel Stancl authored
* Add the support for the fast (rust) implementation of BlenbderbotTokenizer * Fix a converter and a typo in a doc * Apply the patil-suraj's suggestion * (Nitpick) Fast tokenization -> Fast Tokenization in doc * Apply the SaulLu's suggestion * Apply Narsil's suggestion to fix test pipelines * Add encoder_no_repeat_ngram_size according to the Narsil's suggestion * Revert the last (unnecessary) commit * Override pipeline config for Blenderbot to allow for larger pos. emb. * make fix-copies
-
Nicolas Patry authored
* Tentative enabling of `batch_size` for pipelines. * Add systematic test for pipeline batching. * Enabling batch_size on almost all pipelines - Not `zero-shot` (it's already passing stuff as batched so trickier) - Not `QA` (preprocess uses squad features, we need to switch to real tensors at this boundary. * Adding `min_length_for_response` for conversational. * Making CTC, speech mappings avaiable regardless of framework. * Attempt at fixing automatic tests (ffmpeg not enabled for fast tests) * Removing ffmpeg dependency in tests. * Small fixes. * Slight cleanup. * Adding docs and adressing comments. * Quality. * Update docs/source/main_classes/pipelines.rst Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/question_answering.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/zero_shot_classification.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Improving docs. * Update docs/source/main_classes/pipelines.rst Co-authored-by:
Philipp Schmid <32632186+philschmid@users.noreply.github.com> * N -> oberved_batch_size softmax trick. * Follow `padding_side`. * Supporting image pipeline batching (and padding). * Rename `unbatch` -> `loader_batch`. * unbatch_size forgot. * Custom padding for offset mappings. * Attempt to remove librosa. * Adding require_audio. * torchaudio. * Back to using datasets librosa. * Adding help to set a pad_token on the tokenizer. * Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Quality. Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Philipp Schmid <32632186+philschmid@users.noreply.github.com>
-
- 14 Oct, 2021 1 commit
-
-
Lysandre Debut authored
* Scatter dummies + skip pipeline tests * Add torch scatter to build docs
-
- 10 Sep, 2021 1 commit
-
-
Nicolas Patry authored
* Enabling dataset iteration on pipelines. Enabling dataset iteration on pipelines. Unifying parameters under `set_parameters` function. Small fix. Last fixes after rebase Remove print. Fixing text2text `generate_kwargs` No more `self.max_length`. Fixing tf only conversational. Consistency in start/stop index over TF/PT. Speeding up drastically on TF (nasty bug where max_length would increase a ton.) Adding test for support for non fast tokenizers. Fixign GPU usage on zero-shot. Fix working on Tf. Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Small cleanup. Remove all asserts + simple format. * Fixing audio-classification for large PR. * Overly explicity null checking. * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`. * Removed internal state for parameters of the pipeline. Instead of overriding implicitly internal state, we moved to real named arguments on every `preprocess`, `_forward`, `postprocess` function. Instead `_sanitize_parameters` will be used to split all kwargs of both __init__ and __call__ into the 3 kinds of named parameters. * Move import warnings. * Small fixes. * Quality. * Another small fix, using the CI to debug faster. * Last fixes. * Last fix. * Small cleanup of tensor moving. * is not None. * Adding a bunch of docs + a iteration test. * Fixing doc style. * KeyDataset = None guard. * RRemoving the Cuda test for pipelines (was testing). * Even more simple iteration test. * Correct import . * Long day. * Fixes in docs. * [WIP] migrating object detection. * Fixed the target_size bug. * Fixup. * Bad variable name. * Fixing `ensure_on_device` respects original ModelOutput.
-
- 27 Aug, 2021 2 commits
-
-
Nicolas Patry authored
* Moving `zero-shot-classification` pipeline to new testing. * Cleaning up old mixins. * Fixing tests `sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english` is corrupted in PT. * Adding warning.
-
Nicolas Patry authored
* Moving `token-classification` pipeline to new testing. * Fix tests.
-
- 26 Aug, 2021 2 commits
-
-
Nicolas Patry authored
- Enforce `test_small_models_{tf,pt}` methods to exist (enforce checking actual values in small tests) - Add support for non RGB image for the pipeline. -
Nicolas Patry authored
* New test format for conversational. * Putting back old mixin. * Re-enabling auto tests with LazyLoading. * Feature extraction tests. * Remove feature-extraction. * Feature extraction with feature_extractor (No pun intended). * Update check_model_type for fill-mask.
-
- 13 Aug, 2021 1 commit
-
-
Nicolas Patry authored
* Fill mask pipelines test updates. * Model eval !! * Adding slow test with actual values. * Making all tests pass (skipping quite a bit.) * Doc styling. * Better doc cleanup. * Making an explicit test with no pad token tokenizer. * Typo.
-
- 29 Jul, 2021 1 commit
-
-
Nicolas Patry authored
* Update feature extraction pipelilne. * Leaving 1 small model for actual values check. * Fixes tests - Better support for tokenizer with no pad token - Increasing PegasusModelTesterConfig for pipelines - Test of feature extraction are more permissive + don't test Multimodel models + encoder-decoder. * Fixing model loading with incorrect shape (+ model with HEAD). * Update tests/test_pipelines_common.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Revert modeling_utils modification. * Some corrections. * Update tests/test_pipelines_common.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update tests/test_pipelines_feature_extraction.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Syntax. * Fixing text-classification tests. * Don't modify this file. Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 22 Jul, 2021 1 commit
-
-
Nicolas Patry authored
* Proposal * Testing pipelines slightly better. - Overall same design - Metaclass to get proper different tests instead of subTest (not well supported by Pytest) - Added ANY meta object to make output checking more readable. - Skipping architectures either without tiny_config or without architecture. * Small fix. * Fixing the tests in case of None value. * Oups. * Rebased with more architectures. * Fixing reformer tests (no override anymore). * Adding more options for model tester config. Co-authored-by:Lysandre <lysandre.debut@reseau.eseo.fr>
-
- 18 May, 2021 1 commit
-
-
Vyom Pathak authored
* Fixed: Better names for nlp variables in pipelines' tests and docs. * Fixed: Better variable names
-
- 25 Feb, 2021 1 commit
-
-
Patrick von Platen authored
[PretrainedFeatureExtractor] + Wav2Vec2FeatureExtractor, Wav2Vec2Processor, Wav2Vec2Tokenizer (#10324) * push to show * small improvement * small improvement * Update src/transformers/feature_extraction_utils.py * Update src/transformers/feature_extraction_utils.py * implement base * add common tests * make all tests pass for wav2vec2 * make padding work & add more tests * finalize feature extractor utils * add call method to feature extraction * finalize feature processor * finish tokenizer * finish general processor design * finish tests * typo * remove bogus file * finish docstring * add docs * finish docs * small fix * correct docs * save intermediate * load changes * apply changes * apply changes to doc * change tests * apply surajs recommend * final changes * Apply suggestions from code review * fix typo * fix import * correct docstring
-
- 07 Dec, 2020 1 commit
-
-
Sylvain Gugger authored
* Add copyright everywhere missing * Style
-
- 15 Nov, 2020 1 commit
-
-
Thomas Wolf authored
[breaking|pipelines|tokenizers] Adding slow-fast tokenizers equivalence tests pipelines - Removing sentencepiece as a required dependency (#8073) * Fixing roberta for slow-fast tests * WIP getting equivalence on pipelines * slow-to-fast equivalence - working on question-answering pipeline * optional FAISS tests * Pipeline Q&A * Move pipeline tests to their own test job again * update tokenizer to add sequence id methods * update to tokenizers 0.9.4 * set sentencepiecce as optional * clean up squad * clean up pipelines to use sequence_ids * style/quality * wording * Switch to use_fast = True by default * update tests for use_fast at True by default * fix rag tokenizer test * removing protobuf from required dependencies * fix NER test for use_fast = True by default * fixing example tests (Q&A examples use slow tokenizers for now) * protobuf in main deps extras["sentencepiece"] and example deps * fix protobug install test * try to fix seq2seq by switching to slow tokenizers for now * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/tokenization_utils_base.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 02 Nov, 2020 1 commit
-
-
Nicolas Patry authored
* Some work to fix the behaviour of DefaultArgumentHandler by removing it. * Fixing specific pipelines argument checking.
-