Commits · 5e8c8eb5bad1d43212b2e8bf724114fe5cdfd807 · chenpangpang / transformers

"flash_attn/vscode:/vscode.git/clone" did not exist on "cb0daccc414021309b8748cbbcbfee5b2604eaf5"

22 Feb, 2023 1 commit
- Apply ruff flake8-comprehensions (#21694) · 5e8c8eb5
  Aaron Gokaslan authored Feb 22, 2023
  
  5e8c8eb5
16 Feb, 2023 1 commit

refactor: Make direct_transformers_import util (#21652) · 0f96c26d

Connor Henderson authored Feb 16, 2023

* refactor: Make direct_import util

* edit direct import fn

* add docstring

* make import function specific to transformers only

* edit doc string

0f96c26d

15 Feb, 2023 1 commit
- Update deprecated load_module (#21651) · 9d1116e9
  Sylvain Gugger authored Feb 15, 2023
  
  9d1116e9
06 Feb, 2023 1 commit

Update quality tooling for formatting (#21480) · 6f79d264

Sylvain Gugger authored Feb 06, 2023

* Result of black 23.1

* Update target to Python 3.7

* Switch flake8 to ruff

* Configure isort

* Configure isort

* Apply isort with line limit

* Put the right black version

* adapt black in check copies

* Fix copies

6f79d264

02 Feb, 2023 1 commit
- Fix some pipeline tests (#21401) · a6d8a149
  Yih-Dar authored Feb 02, 2023
```
* fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  a6d8a149
30 Jan, 2023 1 commit

Pipeline testing - using tiny models on Hub (#20426) · c749bd40

Yih-Dar authored Jan 30, 2023



* rework pipeline tests

* run pipeline tests

* fix

* fix

* fix

* revert the changes in get_test_pipeline() parameter list

* fix expected error message

* skip a test

* clean up

---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

c749bd40

25 Jan, 2023 1 commit

Supporting `ImageProcessor` in place of `FeatureExtractor` for pipelines (#20851) · 99e79054

Nicolas Patry authored Jan 25, 2023



* Fixing the pipeline with image processor.

* Update the slow test.

* Using only the first image processor.

* Include exclusion mecanism for Image processor.

* Do not handle Gitconfig, deemed as a bug.

* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove `conversational` changes. They are not supposed to be here.

* Address first row of comments.

* Remove OneFormer modifications.
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

99e79054

18 Jan, 2023 1 commit

Adapt repository creation to latest hf_hub (#21158) · 05e72aa0

Sylvain Gugger authored Jan 18, 2023

* Adapt repository creation to latest hf_hub

* Update all examples

* Fix other tests, add Flax examples

* Address review comments

05e72aa0

16 Jan, 2023 1 commit

Fixing batching pipelines on single items for ChunkPipeline (#21132) · 488a179c

Nicolas Patry authored Jan 16, 2023

* Fixing #20783

* Update src/transformers/pipelines/base.py

* Fixing some tests.

* Fixup.

* Remove ffmpeg dep + a bit more relaxed for bigbird QA precision.

* Better dataset.

* Prevent failing on TF.

* Better condition. We can't use `can_use_iterator` since we cannot use it
directly.

488a179c

19 Dec, 2022 1 commit

Implement Roberta PreLayerNorm (#20305) · b4b613b1

Andreas Madsen authored Dec 19, 2022



* Copy RoBERTa

* formatting

* implement RoBERTa with prelayer normalization

* update test expectations

* add documentation

* add convertion script for DinkyTrain weights

* update checkpoint repo

Unfortunately the original checkpoints assumes a hacked roberta model

* add to RoBERTa-PreLayerNorm docs to toc

* run utils/check_copies.py

* lint files

* remove unused import

* fix check_repo reporting wrongly a test is missing

* fix import error, caused by rebase

* run make fix-copies

* add RobertaPreLayerNormConfig to ROBERTA_EMBEDDING_ADJUSMENT_CONFIGS

* Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup: Fix documentation <Facebook> -> Facebook
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add missing Flax header
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* expected_slice -> EXPECTED_SLICE
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update copies after rebase

* add missing copied from statements

* make fix-copies

* make prelayernorm explicit in code

* fix checkpoint path for the original implementation

* add flax integration tests

* improve docs

* update utils/documentation_tests.txt

* lint files

* Remove Copyright notice
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make fix-copies

* Remove EXPECTED_SLICE calculation comments
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

b4b613b1

21 Nov, 2022 1 commit

Add Audio Spectogram Transformer (#19981) · 4973d2a0

NielsRogge authored Nov 21, 2022



* First draft

* Make conversion script work

* Add id2label mapping, run code quality

* Fix copies

* Add first draft of feature extractor

* Update conversion script to use feature extractor

* Make more tests pass

* Add docs

* update input_features to input_values + pad by default to max length

* Fix doc tests

* Add feature extractor tests

* Add proper padding/truncation to feature extractor

* Add support for conversion of all audioset checkpoints

* Improve docs and extend conversion script

* Fix README

* Rename spectogram to spectrogram

* Fix copies

* Add integration test

* Remove dummy conv

* Update to ast

* Update organization

* Fix init

* Rename model to AST

* Add require_torchaudio annotator

* Move import of ASTFeatureExtractor under a is_speech_available

* Fix rebase

* Add pipeline config

* Update name of classifier head

* Rename time_dimension and frequency_dimension for clarity

* Remove print statement

* Fix pipeline test

* Fix pipeline test

* Fix index table

* Fix init

* Fix conversion script

* Rename to ForAudioClassification

* Fix index table
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

4973d2a0

14 Nov, 2022 1 commit

Fix tapas scatter (#20149) · 78a471ff

Bartosz Szmelczynski authored Nov 14, 2022



* First draft

* Remove scatter dependency

* Add require_torch

* update vectorized sum test, add clone call

* remove artifacts

* fix style

* fix style v2

* remove "scatter" mentions from the code base

* fix isort error
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

78a471ff

03 Nov, 2022 1 commit
- Now supporting pathlike in pipelines too. (#20030) · ec6878f6
  Nicolas Patry authored Nov 03, 2022
  
  ec6878f6
17 Oct, 2022 1 commit

Fix pipeline predict transform methods (#19657) · 8aad4363

Sivaudha authored Oct 17, 2022

* Remove key word argument X from pipeline predict and transform methods

As __call__ of pipeline clasees require one positional argument, passing
the input as a keyword argument inside predict, transform methods, causing
__call__ to fail. Hence in this commit the keyword argument is modified
into positional argument.

* Implement basic tests for scikitcompat pipeline interface

* Seperate tests instead of running with parameterized based on framework as both frameworks will not be active at the same time

8aad4363

11 Oct, 2022 1 commit

Fix whisper for `pipeline` (#19482) · b722a6be

Arthur authored Oct 11, 2022

* update feature extractor params

* update attention mask handling

* fix doc and pipeline test

* add warning when skipping test

* add whisper translation and transcription test

* fix build doc test

b722a6be

07 Oct, 2022 1 commit

Rework pipeline tests (#19366) · 9ac586b3

Sylvain Gugger authored Oct 07, 2022

* Rework pipeline tests

* Try to fix Flax tests

* Try to put it before

* Use a new decorator instead

* Remove ignore marker since it doesn't work

* Filter pipeline tests

* Woopsie

* Use the fitlered list

* Clean up and fake modif

* Remove init

* Revert fake modif

9ac586b3

05 Oct, 2022 1 commit
- Fix pipeline tests for Roberta-like tokenizers (#19365) · 7e7f62bf
  Sylvain Gugger authored Oct 05, 2022
```
* Fix pipeline tests for Roberta-like tokenizers

* Fix fix
```
  7e7f62bf
06 Sep, 2022 1 commit

Further reduce the number of alls to head for cached objects (#18871) · 71ff88fa

Sylvain Gugger authored Sep 06, 2022

* Further reduce the number of alls to head for cached models/tokenizers/pipelines

* Fix tests

* Address review comments

71ff88fa

10 Aug, 2022 1 commit

Use commit hash to look in cache instead of calling head (#18534) · 0d0aada5

Sylvain Gugger authored Aug 10, 2022



* Use commit hash to look in cache instead of calling head

* Add tests

* Add attr for local configs too

* Stupid typos

* Fix tests

* Update src/transformers/utils/hub.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address Julien's comments
Co-authored-by: Julien Chaumond <julien@huggingface.co>

0d0aada5

05 Aug, 2022 1 commit
- Fix pipeline tests (#18487) · 70fa1a8d
  Sylvain Gugger authored Aug 05, 2022
```
* Fix pipeline tests

* Make sure all pipelines tests run with init changes
```
  70fa1a8d
19 Jul, 2022 1 commit

Custom pipeline (#18079) · dc9147ff

Sylvain Gugger authored Jul 19, 2022



* Initial work

* More work

* Add tests for custom pipelines on the Hub

* Protect import

* Make the test work for TF as well

* Last PyTorch specific bit

* Add documentation

* Style

* Title in toc

* Bad names!

* Update docs/source/en/add_new_pipeline.mdx
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Auto stash before merge of "custom_pipeline" and "origin/custom_pipeline"

* Address review comments

* Address more review comments

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

dc9147ff

01 Jul, 2022 1 commit
- Restore original task in test_warning_logs (#17985) · 6f0723a9
  Yih-Dar authored Jul 01, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  6f0723a9
30 Jun, 2022 2 commits

feat: add pipeline registry abstraction (#17905) · 49cd736a

Aaron Pham authored Jun 30, 2022



* feat: add pipeline registry abstraction

- added `PipelineRegistry` abstraction
- updates `add_new_pipeline.mdx` (english docs) to reflect the api addition
- migrate `check_task` and `get_supported_tasks` from
  transformers/pipelines/__init__.py to
  transformers/pipelines/base.py#PipelineRegistry.{check_task,get_supported_tasks}
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: update with upstream/main

chore: Apply suggestions from sgugger's code review
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* chore: PR updates

- revert src/transformers/dependency_versions_table.py from upstream/main
- updates pipeline registry to use global variables
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add tests for pipeline registry
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* tests: add test for output warning.
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* chore: fmt and cleanup unused imports
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>

* fix: change imports to top of the file and address comments
Signed-off-by: Aaron Pham <29749331+aarnphm@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

49cd736a

[Pipelines] Add revision tag to all default pipelines (#17667) · e4d25885

Patrick von Platen authored Jun 30, 2022



* trigger test failure

* upload revision poc

* Update src/transformers/pipelines/base.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* up

* add test

* correct some stuff

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct require flag
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

e4d25885

12 May, 2022 1 commit

Black preview (#17217) · afe5d42d

Sylvain Gugger authored May 12, 2022

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

afe5d42d

05 May, 2022 1 commit
- fix missing "models" in pipeline test module (#17090) · a59eb349
  Yih-Dar authored May 05, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  a59eb349
04 Mar, 2022 1 commit
- Re-enabling all fast pipeline tests. (#15924) · a6e3b179
  Nicolas Patry authored Mar 04, 2022
  
  a6e3b179
23 Feb, 2022 2 commits

[Test refactor 1/5] Per-folder tests reorganization (#15725) · 29c10a41

Lysandre Debut authored Feb 23, 2022



* Per-folder tests reorganization
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
Co-authored-by: Stas Bekman <stas@stason.org>

29c10a41

Enable `image-segmentation` on `AutoModelForSemanticSegmentation` (#15647) · 9e71d464

Nicolas Patry authored Feb 23, 2022

* Enabling Beit SegFormer to `image-segmentation`.

* Fixing the score.

* Fix import ?

* Missing in type hint.

* Multiple test fixes:

- Add `raw_image` support. It should be the default IMHO since in Python
  world it doesn't make any sense to base64 encode the image (Sorry
  @mishig, didn't catch that in my review). I really think we should
  consider breaking BC here.
- Add support for Segformer tiny test (needed
  `SegformerModelTester.get_config` to enable TinyConfig
  @NielsRogge)
- Add the check that `batch_size` works correctly on that pipeline.
  Uncovered that it doesn't for Detr, which IMO is OK since images
  after `feature_extractor` don't have the same size. Comment should
  explain.

* Type hint as a string.

* Make fixup + update black.

* torch+vision protections.

* Don't use torchvision, use F.interpolate instead (no new dep).

* Last fixes for Segformer.

* Update test to reflect new image (which was broken)

* Update tests.

* Major BC modification:

- Removed the string compressed PNG string, that's a job for users
`transformers` stays in python land.
- Removed the `score` for semantic segmentation. It has hardly a meaning
  on its own in this context.
- Don't include the grayscale with logits for now (which could enable
  users to get a sense of confidence). Might be done later.
- Don't include the surface of the mask (could be used for sorting by
  users, to filter out small masks). It's already calculable, and
  it's easier to add later, than to add now and break later if we need.

* `make fixup`.

* Small changes.

* Rebase + doc fixup.

9e71d464

05 Jan, 2022 1 commit
- Adding QoL for `batch_size` arg (like others enabled everywhere). (#15027) · 65cb94ff
  Nicolas Patry authored Jan 05, 2022
```
* Adding QoL for `batch_size` arg (like others enabled everywhere).

* Typo.
```
  65cb94ff
04 Jan, 2022 1 commit

Hotfix `chunk_length_s` instead of `_ms`. (#15029) · 19d37c2d

Nicolas Patry authored Jan 04, 2022

* Hotfix `chunk_length_s` instead of `_ms`.

* Adding fix of `pad_token` which should be last/previous token for CTC

proper decoding

* Fixing ChunkPipeline unwrapping.

* Adding a PackIterator specific test.

19d37c2d

27 Dec, 2021 1 commit

ChunkPipeline (batch_size enabled on `zero-cls` and `qa` pipelines. (#14225) · b058490c

Nicolas Patry authored Dec 27, 2021



* Pipeline chunks.

* Batching for Chunking pipelines ?

* Batching for `question-answering` and `zero-shot-cls`.

* Fixing for FNet.

* Making ASR a chunk pipeline.

* Chunking ASR API.

* doc style.

* Fixing ASR test.

* Fixing QA eror (p_mask, padding is 1, not 0).

* Enable both vad and simple chunking.

* Max length for vad.

* remove inference mode, crashing on s2t.

* Revert ChunkPipeline for ASRpipeline.

Too many knobs for simple integration within the pipeline, better stick
to external convenience functions instead, more control to be had,
simpler pipeline and also easier to replace with other things later.

* Drop necessity for PT for these.

* Enabling generators.

* Add mic + cleanup.

* Typo.

* Typo2.

* Remove ASR work, it does not belong in this PR anymore.

* Update src/transformers/pipelines/pt_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/pipelines/zero_shot_classification.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Adding many comments.

* Doc quality.

* `hidden_states` handling.

* Adding doc.

* Bad rebase.

* Autofixing docs.

* Fixing CRITICAL bug in the new Zerocls pipeline.
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

b058490c

14 Dec, 2021 1 commit

Fixing tests for Perceiver (#14739) · 546a91ab

Nicolas Patry authored Dec 14, 2021

* Adding some slow test to check for perceiver at least from a high level.

* Re-enabling fast tests for Perceiver ImageClassification.

* Perceiver might try to run without Tokenizer (Fast doesn't exist) and
with FeatureExtractor some text only pipelines.

* Oops.

* Adding a comment for `update_config_with_model_class`.

* Remove `model_architecture` to get `tiny_config`.

* Finalize rebase.

* Smarter way to handle undefined FastTokenizer.

* Remove old code.

* Addressing some nits.

* Don't instantiate `None`.

546a91ab

13 Dec, 2021 1 commit

Fixing tests for Perceiver (#14745) · 3d66146a

Lysandre Debut authored Dec 13, 2021



- Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for
langage, which cannot load a FeatureExtractor so current logic fails).
- Add a safeguard to not run tests when `tokenizer_class` or
`feature_extractor_class` **are** defined, but cannot be loaded
This happens for Perceiver for the "FastTokenizer" (which doesn't exist
so None) and FeatureExtractor (which does exist but cannot be loaded
because the checkpoint doesn't define one which is reasonable for the
said checkpoint)
- Added `get_vocab` function to `PerceiverTokenizer` since it is used by
`fill-mask` pipeline when the argument `targets` is used to narrow a
subset of possible values.
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

3d66146a

08 Dec, 2021 1 commit

Fixing Dataset for TQA + token-classification. (#14658) · 2e12d90b

Nicolas Patry authored Dec 08, 2021

* Fixing Dataset for TQA + token-classification.

* Fixing the tests.

* Making sure `offset_mappings` is a valid argument.

2e12d90b

22 Nov, 2021 1 commit
- Moving pipeline tests from `Narsil` to `hf-internal-testing`. (#14463) · a4553e6c
  Nicolas Patry authored Nov 22, 2021
```
* Moving everything to `hf-internal-testing`.

* Fixing test values.

* Moving to other repo.

* Last touch?
```
  a4553e6c
19 Nov, 2021 1 commit
- Adding support for `hidden_states` and `attentions` in unbatching (#14420) · 81fe8afa
  Nicolas Patry authored Nov 19, 2021
```
support.
```
  81fe8afa
12 Nov, 2021 1 commit

Adding support for raw python `generator` in addition to `Dataset` for pipelines (#14352) · ed5d1551

Nicolas Patry authored Nov 12, 2021

* Adding support for raw python `generator` in addition to `Dataset`

The main goal is to ease the create of streaming data to the pipe.

`Dataset` is more involved and pytorch specific.

This PR, provides a way to use a python iterator too.
This enabled #14250 but can be proposed as a standalone PR.

```python
from transformers import pipeline

def read_data(filename):
    with open(filename, 'r') as f:
        for line in f:
            yield f

pipe = pipeline("text-classification")
for classified in pipe(read_data("large_file.txt")):
    print("Success ! ", classified)
```

The main caveat of this, is the interaction with `DataLoader` with
`num_workers>1`. When you have multiple workers, each receive a copy
of the generator (like `IterableDataset`). That means the naive Iterator
will fail since all workers iterate on all items of the generator.

There are ways to do clever "skipping", but it could be bad still
because all workers still do have to pass through all items of the
generator (they just ignore items they don't handle), depending on
the case it might be bad.

Using `num_workers=1` is the simplest fix and if the cost of loading
your data is small enough should be good enough. In the above example
trying to do smart tricks to skip some lines is unlikely to be a net
positive for instance.

If there are better ways to do "jumps" on some data, then using
`Dataset` is more advised (since then differents workers can just jump
themselves).

* Adding iterator support for `tf` too.

ed5d1551

10 Nov, 2021 1 commit

Adding some quality of life for `pipeline` function. (#14322) · 5c153079

Nicolas Patry authored Nov 10, 2021



* Adding some quality of life for `pipeline` function.

* Update docs/source/main_classes/pipelines.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/pipelines/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Improve the tests.
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

5c153079

03 Nov, 2021 1 commit

Adding support for `truncation` parameter on `feature-extraction` pipeline. (#14193) · dec759e7

Nicolas Patry authored Nov 03, 2021

* Adding support for `truncation` parameter on `feature-extraction`
pipeline.

Fixes #14183

* Fixing tests on ibert, longformer, and roberta.

* Rebase fix.

dec759e7