- 21 Nov, 2022 1 commit
-
-
NielsRogge authored
* First draft * Make conversion script work * Add id2label mapping, run code quality * Fix copies * Add first draft of feature extractor * Update conversion script to use feature extractor * Make more tests pass * Add docs * update input_features to input_values + pad by default to max length * Fix doc tests * Add feature extractor tests * Add proper padding/truncation to feature extractor * Add support for conversion of all audioset checkpoints * Improve docs and extend conversion script * Fix README * Rename spectogram to spectrogram * Fix copies * Add integration test * Remove dummy conv * Update to ast * Update organization * Fix init * Rename model to AST * Add require_torchaudio annotator * Move import of ASTFeatureExtractor under a is_speech_available * Fix rebase * Add pipeline config * Update name of classifier head * Rename time_dimension and frequency_dimension for clarity * Remove print statement * Fix pipeline test * Fix pipeline test * Fix index table * Fix init * Fix conversion script * Rename to ForAudioClassification * Fix index table Co-authored-by:Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
-
- 14 Nov, 2022 1 commit
-
-
Bartosz Szmelczynski authored
* First draft * Remove scatter dependency * Add require_torch * update vectorized sum test, add clone call * remove artifacts * fix style * fix style v2 * remove "scatter" mentions from the code base * fix isort error Co-authored-by:
Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local> Co-authored-by:
ydshieh <ydshieh@users.noreply.github.com>
-
- 03 Nov, 2022 1 commit
-
-
Nicolas Patry authored
-
- 17 Oct, 2022 1 commit
-
-
Sivaudha authored
* Remove key word argument X from pipeline predict and transform methods As __call__ of pipeline clasees require one positional argument, passing the input as a keyword argument inside predict, transform methods, causing __call__ to fail. Hence in this commit the keyword argument is modified into positional argument. * Implement basic tests for scikitcompat pipeline interface * Seperate tests instead of running with parameterized based on framework as both frameworks will not be active at the same time
-
- 11 Oct, 2022 1 commit
-
-
Arthur authored
* update feature extractor params * update attention mask handling * fix doc and pipeline test * add warning when skipping test * add whisper translation and transcription test * fix build doc test
-
- 07 Oct, 2022 1 commit
-
-
Sylvain Gugger authored
* Rework pipeline tests * Try to fix Flax tests * Try to put it before * Use a new decorator instead * Remove ignore marker since it doesn't work * Filter pipeline tests * Woopsie * Use the fitlered list * Clean up and fake modif * Remove init * Revert fake modif
-
- 05 Oct, 2022 1 commit
-
-
Sylvain Gugger authored
* Fix pipeline tests for Roberta-like tokenizers * Fix fix
-
- 06 Sep, 2022 1 commit
-
-
Sylvain Gugger authored
* Further reduce the number of alls to head for cached models/tokenizers/pipelines * Fix tests * Address review comments
-
- 10 Aug, 2022 1 commit
-
-
Sylvain Gugger authored
* Use commit hash to look in cache instead of calling head * Add tests * Add attr for local configs too * Stupid typos * Fix tests * Update src/transformers/utils/hub.py Co-authored-by:
Julien Chaumond <julien@huggingface.co> * Address Julien's comments Co-authored-by:
Julien Chaumond <julien@huggingface.co>
-
- 05 Aug, 2022 1 commit
-
-
Sylvain Gugger authored
* Fix pipeline tests * Make sure all pipelines tests run with init changes
-
- 19 Jul, 2022 1 commit
-
-
Sylvain Gugger authored
* Initial work * More work * Add tests for custom pipelines on the Hub * Protect import * Make the test work for TF as well * Last PyTorch specific bit * Add documentation * Style * Title in toc * Bad names! * Update docs/source/en/add_new_pipeline.mdx Co-authored-by:
Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline" * Address review comments * Address more review comments * Update src/transformers/pipelines/__init__.py Co-authored-by:
Lysandre Debut <lysandre.debut@reseau.eseo.fr> Co-authored-by:
Lysandre Debut <lysandre.debut@reseau.eseo.fr>
-
- 01 Jul, 2022 1 commit
-
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 30 Jun, 2022 2 commits
-
-
Aaron Pham authored
* feat: add pipeline registry abstraction - added `PipelineRegistry` abstraction - updates `add_new_pipeline.mdx` (english docs) to reflect the api addition - migrate `check_task` and `get_supported_tasks` from transformers/pipelines/__init__.py to transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks} Signed-off-by:Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: update with upstream/main chore: Apply suggestions from sgugger's code review Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * chore: PR updates - revert src/transformers/dependency_versions_table.py from upstream/main - updates pipeline registry to use global variables Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * tests: add tests for pipeline registry Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * tests: add test for output warning. Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * chore: fmt and cleanup unused imports Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> * fix: change imports to top of the file and address comments Signed-off-by:
Aaron Pham <29749331+aarnphm@users.noreply.github.com> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
Patrick von Platen authored
* trigger test failure * upload revision poc * Update src/transformers/pipelines/base.py Co-authored-by:
Julien Chaumond <julien@huggingface.co> * up * add test * correct some stuff * Update src/transformers/pipelines/__init__.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * correct require flag Co-authored-by:
Julien Chaumond <julien@huggingface.co> Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 12 May, 2022 1 commit
-
-
Sylvain Gugger authored
* Black preview * Fixup too! * Fix check copies * Use the same version as the CI * Bump black
-
- 05 May, 2022 1 commit
-
-
Yih-Dar authored
Co-authored-by:ydshieh <ydshieh@users.noreply.github.com>
-
- 04 Mar, 2022 1 commit
-
-
Nicolas Patry authored
-
- 23 Feb, 2022 2 commits
-
-
Lysandre Debut authored
* Per-folder tests reorganization Co-authored-by:
sgugger <sylvain.gugger@gmail.com> Co-authored-by:
Stas Bekman <stas@stason.org>
-
Nicolas Patry authored
* Enabling Beit SegFormer to `image-segmentation`. * Fixing the score. * Fix import ? * Missing in type hint. * Multiple test fixes: - Add `raw_image` support. It should be the default IMHO since in Python world it doesn't make any sense to base64 encode the image (Sorry @mishig, didn't catch that in my review). I really think we should consider breaking BC here. - Add support for Segformer tiny test (needed `SegformerModelTester.get_config` to enable TinyConfig @NielsRogge) - Add the check that `batch_size` works correctly on that pipeline. Uncovered that it doesn't for Detr, which IMO is OK since images after `feature_extractor` don't have the same size. Comment should explain. * Type hint as a string. * Make fixup + update black. * torch+vision protections. * Don't use torchvision, use F.interpolate instead (no new dep). * Last fixes for Segformer. * Update test to reflect new image (which was broken) * Update tests. * Major BC modification: - Removed the string compressed PNG string, that's a job for users `transformers` stays in python land. - Removed the `score` for semantic segmentation. It has hardly a meaning on its own in this context. - Don't include the grayscale with logits for now (which could enable users to get a sense of confidence). Might be done later. - Don't include the surface of the mask (could be used for sorting by users, to filter out small masks). It's already calculable, and it's easier to add later, than to add now and break later if we need. * `make fixup`. * Small changes. * Rebase + doc fixup.
-
- 05 Jan, 2022 1 commit
-
-
Nicolas Patry authored
* Adding QoL for `batch_size` arg (like others enabled everywhere). * Typo.
-
- 04 Jan, 2022 1 commit
-
-
Nicolas Patry authored
* Hotfix `chunk_length_s` instead of `_ms`. * Adding fix of `pad_token` which should be last/previous token for CTC proper decoding * Fixing ChunkPipeline unwrapping. * Adding a PackIterator specific test.
-
- 27 Dec, 2021 1 commit
-
-
Nicolas Patry authored
* Pipeline chunks. * Batching for Chunking pipelines ? * Batching for `question-answering` and `zero-shot-cls`. * Fixing for FNet. * Making ASR a chunk pipeline. * Chunking ASR API. * doc style. * Fixing ASR test. * Fixing QA eror (p_mask, padding is 1, not 0). * Enable both vad and simple chunking. * Max length for vad. * remove inference mode, crashing on s2t. * Revert ChunkPipeline for ASRpipeline. Too many knobs for simple integration within the pipeline, better stick to external convenience functions instead, more control to be had, simpler pipeline and also easier to replace with other things later. * Drop necessity for PT for these. * Enabling generators. * Add mic + cleanup. * Typo. * Typo2. * Remove ASR work, it does not belong in this PR anymore. * Update src/transformers/pipelines/pt_utils.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/pipelines/zero_shot_classification.py Co-authored-by:
Lysandre Debut <lysandre@huggingface.co> * Adding many comments. * Doc quality. * `hidden_states` handling. * Adding doc. * Bad rebase. * Autofixing docs. * Fixing CRITICAL bug in the new Zerocls pipeline. Co-authored-by:
Lysandre Debut <lysandre@huggingface.co>
-
- 14 Dec, 2021 1 commit
-
-
Nicolas Patry authored
* Adding some slow test to check for perceiver at least from a high level. * Re-enabling fast tests for Perceiver ImageClassification. * Perceiver might try to run without Tokenizer (Fast doesn't exist) and with FeatureExtractor some text only pipelines. * Oops. * Adding a comment for `update_config_with_model_class`. * Remove `model_architecture` to get `tiny_config`. * Finalize rebase. * Smarter way to handle undefined FastTokenizer. * Remove old code. * Addressing some nits. * Don't instantiate `None`.
-
- 13 Dec, 2021 1 commit
-
-
Lysandre Debut authored
- Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for langage, which cannot load a FeatureExtractor so current logic fails). - Add a safeguard to not run tests when `tokenizer_class` or `feature_extractor_class` **are** defined, but cannot be loaded This happens for Perceiver for the "FastTokenizer" (which doesn't exist so None) and FeatureExtractor (which does exist but cannot be loaded because the checkpoint doesn't define one which is reasonable for the said checkpoint) - Added `get_vocab` function to `PerceiverTokenizer` since it is used by `fill-mask` pipeline when the argument `targets` is used to narrow a subset of possible values. Co-authored-by:Nicolas Patry <patry.nicolas@protonmail.com>
-
- 08 Dec, 2021 1 commit
-
-
Nicolas Patry authored
* Fixing Dataset for TQA + token-classification. * Fixing the tests. * Making sure `offset_mappings` is a valid argument.
-
- 22 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Moving everything to `hf-internal-testing`. * Fixing test values. * Moving to other repo. * Last touch?
-
- 19 Nov, 2021 1 commit
-
-
Nicolas Patry authored
support.
-
- 12 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Adding support for raw python `generator` in addition to `Dataset` The main goal is to ease the create of streaming data to the pipe. `Dataset` is more involved and pytorch specific. This PR, provides a way to use a python iterator too. This enabled #14250 but can be proposed as a standalone PR. ```python from transformers import pipeline def read_data(filename): with open(filename, 'r') as f: for line in f: yield f pipe = pipeline("text-classification") for classified in pipe(read_data("large_file.txt")): print("Success ! ", classified) ``` The main caveat of this, is the interaction with `DataLoader` with `num_workers>1`. When you have multiple workers, each receive a copy of the generator (like `IterableDataset`). That means the naive Iterator will fail since all workers iterate on all items of the generator. There are ways to do clever "skipping", but it could be bad still because all workers still do have to pass through all items of the generator (they just ignore items they don't handle), depending on the case it might be bad. Using `num_workers=1` is the simplest fix and if the cost of loading your data is small enough should be good enough. In the above example trying to do smart tricks to skip some lines is unlikely to be a net positive for instance. If there are better ways to do "jumps" on some data, then using `Dataset` is more advised (since then differents workers can just jump themselves). * Adding iterator support for `tf` too.
-
- 10 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Adding some quality of life for `pipeline` function. * Update docs/source/main_classes/pipelines.rst Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/__init__.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Improve the tests. Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
-
- 03 Nov, 2021 1 commit
-
-
Nicolas Patry authored
* Adding support for `truncation` parameter on `feature-extraction` pipeline. Fixes #14183 * Fixing tests on ibert, longformer, and roberta. * Rebase fix.
-
- 29 Oct, 2021 3 commits
-
-
Nicolas Patry authored
* Adding `handle_long_generation` paramters for `text-generation` pipeline. * More error handling * Fixing tests by dropping tf support on this functionality, it needs `max_new_tokens` to make it possible to understand user's intent. Otherwise, `max_length` == `tokenizer.model_max_length` < input_ids.shape[0]. * Fixing doc ? * Doc ? * Remove link from doc. * Catched an issue on roberta. * Damn doc. * Non BC proposal ? * Cleaning the fix ? * Finally using only a test override. * Don't need to modify this. * Bad print.
-
Daniel Stancl authored
* Add the support for the fast (rust) implementation of BlenbderbotTokenizer * Fix a converter and a typo in a doc * Apply the patil-suraj's suggestion * (Nitpick) Fast tokenization -> Fast Tokenization in doc * Apply the SaulLu's suggestion * Apply Narsil's suggestion to fix test pipelines * Add encoder_no_repeat_ngram_size according to the Narsil's suggestion * Revert the last (unnecessary) commit * Override pipeline config for Blenderbot to allow for larger pos. emb. * make fix-copies
-
Nicolas Patry authored
* Tentative enabling of `batch_size` for pipelines. * Add systematic test for pipeline batching. * Enabling batch_size on almost all pipelines - Not `zero-shot` (it's already passing stuff as batched so trickier) - Not `QA` (preprocess uses squad features, we need to switch to real tensors at this boundary. * Adding `min_length_for_response` for conversational. * Making CTC, speech mappings avaiable regardless of framework. * Attempt at fixing automatic tests (ffmpeg not enabled for fast tests) * Removing ffmpeg dependency in tests. * Small fixes. * Slight cleanup. * Adding docs and adressing comments. * Quality. * Update docs/source/main_classes/pipelines.rst Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/question_answering.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/zero_shot_classification.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Improving docs. * Update docs/source/main_classes/pipelines.rst Co-authored-by:
Philipp Schmid <32632186+philschmid@users.noreply.github.com> * N -> oberved_batch_size softmax trick. * Follow `padding_side`. * Supporting image pipeline batching (and padding). * Rename `unbatch` -> `loader_batch`. * unbatch_size forgot. * Custom padding for offset mappings. * Attempt to remove librosa. * Adding require_audio. * torchaudio. * Back to using datasets librosa. * Adding help to set a pad_token on the tokenizer. * Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Quality. Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by:
Philipp Schmid <32632186+philschmid@users.noreply.github.com>
-
- 14 Oct, 2021 1 commit
-
-
Lysandre Debut authored
* Scatter dummies + skip pipeline tests * Add torch scatter to build docs
-
- 10 Sep, 2021 1 commit
-
-
Nicolas Patry authored
* Enabling dataset iteration on pipelines. Enabling dataset iteration on pipelines. Unifying parameters under `set_parameters` function. Small fix. Last fixes after rebase Remove print. Fixing text2text `generate_kwargs` No more `self.max_length`. Fixing tf only conversational. Consistency in start/stop index over TF/PT. Speeding up drastically on TF (nasty bug where max_length would increase a ton.) Adding test for support for non fast tokenizers. Fixign GPU usage on zero-shot. Fix working on Tf. Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Update src/transformers/pipelines/base.py Co-authored-by:
Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Small cleanup. Remove all asserts + simple format. * Fixing audio-classification for large PR. * Overly explicity null checking. * Encapsulating GPU/CPU pytorch manipulation directly within `base.py`. * Removed internal state for parameters of the pipeline. Instead of overriding implicitly internal state, we moved to real named arguments on every `preprocess`, `_forward`, `postprocess` function. Instead `_sanitize_parameters` will be used to split all kwargs of both __init__ and __call__ into the 3 kinds of named parameters. * Move import warnings. * Small fixes. * Quality. * Another small fix, using the CI to debug faster. * Last fixes. * Last fix. * Small cleanup of tensor moving. * is not None. * Adding a bunch of docs + a iteration test. * Fixing doc style. * KeyDataset = None guard. * RRemoving the Cuda test for pipelines (was testing). * Even more simple iteration test. * Correct import . * Long day. * Fixes in docs. * [WIP] migrating object detection. * Fixed the target_size bug. * Fixup. * Bad variable name. * Fixing `ensure_on_device` respects original ModelOutput.
-
- 27 Aug, 2021 2 commits
-
-
Nicolas Patry authored
* Moving `zero-shot-classification` pipeline to new testing. * Cleaning up old mixins. * Fixing tests `sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english` is corrupted in PT. * Adding warning.
-
Nicolas Patry authored
* Moving `token-classification` pipeline to new testing. * Fix tests.
-
- 26 Aug, 2021 2 commits
-
-
Nicolas Patry authored
- Enforce `test_small_models_{tf,pt}` methods to exist (enforce checking actual values in small tests) - Add support for non RGB image for the pipeline. -
Nicolas Patry authored
* New test format for conversational. * Putting back old mixin. * Re-enabling auto tests with LazyLoading. * Feature extraction tests. * Remove feature-extraction. * Feature extraction with feature_extractor (No pun intended). * Update check_model_type for fill-mask.
-
- 13 Aug, 2021 1 commit
-
-
Nicolas Patry authored
* Fill mask pipelines test updates. * Model eval !! * Adding slow test with actual values. * Making all tests pass (skipping quite a bit.) * Doc styling. * Better doc cleanup. * Making an explicit test with no pad token tokenizer. * Typo.
-