Commits · efa889d2e4ba0605baa77d3c364735d0caaa9463 · chenpangpang / transformers

08 Nov, 2022 3 commits

Add RocBert (#20013) · efa889d2

Weiwe Shi authored Nov 08, 2022



* add roc_bert

* update roc_bert readme

* code style

* change name and delete unuse file

* udpate model file

* delete unuse log file

* delete tokenizer fast

* reformat code and change model file path

* add RocBertForPreTraining

* update docs

* delete wrong notes

* fix copies

* fix make repo-consistency error

* fix files are not present in the table of contents error

* change RocBert -> RoCBert

* add doc, add detail test
Co-authored-by: weiweishi <weiweishi@tencent.com>

efa889d2

Add CLIPSeg (#20066) · 25896306

NielsRogge authored Nov 08, 2022



* Add first draft

* Update conversion script

* Improve conversion script

* Improve conversion script some more

* Add conditional embeddings

* Add initial decoder

* Fix activation function of decoder

* Make decoder outputs match original implementation

* Make decoder outputs match original implementation

* Add more copied from statements

* Improve model outputs

* Fix auto tokenizer file

* Fix more tests

* Add test

* Improve README and docs, improve conditional embeddings

* Fix more tests

* Remove print statements

* Remove initial embeddings

* Improve conversion script

* Add interpolation of position embeddings

* Finish addition of interpolation of position embeddings

* Add support for refined checkpoint

* Fix refined checkpoint

* Remove unused parameter

* Improve conversion script

* Add support for training

* Fix conversion script

* Add CLIPSegFeatureExtractor

* Fix processor

* Fix CLIPSegProcessor

* Fix conversion script

* Fix most tests

* Fix equivalence test

* Fix README

* Add model to doc tests

* Use better variable name

* Convert other checkpoint as well

* Update config, add link to paper

* Add docs

* Update organization

* Replace base_model_prefix with clip

* Fix base_model_prefix

* Fix checkpoint of config

* Fix config checkpoint

* Remove file

* Use logits for output

* Fix tests
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>

25896306

[Audio Processor] Only pass sr to feat extractor (#20022) · 3e39fd09
Sanchit Gandhi authored Nov 08, 2022
```
* [Audio Processor] Only pass sr to feat extractor

* move out of if/else

* copy to other processors
```
3e39fd09

07 Nov, 2022 7 commits
- Fix AutoTokenizer with subfolder passed (#20110) · fb1c8db7
  Sylvain Gugger authored Nov 07, 2022
  
  fb1c8db7
- Fix `generate_dummy_inputs` for `ImageGPTOnnxConfig` (#20103) · 2bdd9fa2
  Yih-Dar authored Nov 07, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  2bdd9fa2
- use huggingface_hub.model_inifo() to get pipline_tag (#20077) · cfaeb153
  TAGAMI Yukihiro authored Nov 08, 2022
  
  cfaeb153
- docs: Resolve many typos in the English docs (#20088) · 3222fc64
  Tom Aarsen authored Nov 07, 2022
```
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'

* docs: Resolve many typos in the English docs

Typos found via 'codespell ./docs/source/en'
```
  3222fc64
- Removing RobertaConfig inheritance from CamembertConfig (#20059) · b77406bc
  Saad Mahmud authored Nov 07, 2022
```
* swap RobertaConfig with PretrainedConfig

* Add camembert specific attributes

* Add PretrainedConfig docstring

* Add arguments docstring

* Change CamembertConfig docstring definition

* Fix typo CamembertConfig -> CamembertModel

* Fix typo BertModel -> CamembertModel

* Fix style of CamembertConfig
```
  b77406bc
- [Doctest] Add configuration_dpr.py (#20080) · 9617b130
  Saad Mahmud authored Nov 07, 2022
```
* Add example docstring for DPRConfig

* Add DPRConfig to documentation_tests
```
  9617b130
- Generate: TF contrastive search with XLA support (#20050) · a0f86743
  Joao Gante authored Nov 07, 2022
```
* Add contrastive search
```
  a0f86743
04 Nov, 2022 8 commits
- Update hub.py (#20075) · 504db92e
  Christopher Akiki authored Nov 04, 2022
  
  504db92e
- Update modeling_tf_utils.py (#20076) · 4b86e446
  Christopher Akiki authored Nov 04, 2022
  
  4b86e446
- Update defaults and logic to match old FE (#20065) · d68c4602
  amyeroberts authored Nov 04, 2022
```
* Update defaults and logic to match old FE

* Use docker run rest values
```
  d68c4602
- change constant torch.tensor to torch.full (#20061) · 707b12a3
  H. Jhoo authored Nov 04, 2022
  
  707b12a3
- [Swin] Add Swin SimMIM checkpoints (#20034) · 787620e2
  NielsRogge authored Nov 04, 2022
```
* Fix Swin

* Remove file

* Update code snippet

* Add copied from to maskformer

* Fix docstring

* Add whole name to replace
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
```
  787620e2
- PoolformerImageProcessor defaults to match previous FE (#20048) · 3936411b
  amyeroberts authored Nov 04, 2022
```
* Poolformer image processor defaults to previous FE

* Remove unnecessary math.floor
```
  3936411b
- [Trainer] Fix model name in push_to_hub (#20064) · 94e17c45
  Sanchit Gandhi authored Nov 04, 2022
  
  94e17c45
- fix `tokenizer_type` to avoid error when loading checkpoint back (#20062) · 19067711
  Sourab Mangrulkar authored Nov 04, 2022
  
  19067711
03 Nov, 2022 10 commits

Speed up TF token classification postprocessing by converting complete tensors to numpy (#19976) · d447c460

Patrick Deutschmann authored Nov 03, 2022



* Speed up TF postprocessing by converting to numpy before

* Fix bug that was triggered when offset_mapping was None
Co-authored-by: Patrick Deutschmann <patrick.deutschmann@dedalus.com>

d447c460

Fixed torch.finfo issue with torch.fx (#20040) · 9080607b
Michael Benayoun authored Nov 03, 2022

9080607b

Update esmfold conversion script (#20028) · 6f257bb3

Matt authored Nov 03, 2022

* Update ESM conversion script for ESMfold

* Fix bug in ESMFold example

* make fixup and move restypes to one line

6f257bb3

fix jit trace error for model forward sequence is not aligned with jit.trace... · 2564f0c2

Wang, Yi authored Nov 03, 2022


fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891)

* fix jit trace error for classification usecase, update related doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add implementation in torch 1.14.0
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

2564f0c2

[FuturWarning] Add futur warning for LEDForSequenceClassification (#19066) · 737bff6a

Arthur authored Nov 03, 2022



* fix led eos_mask

* add Futur Warning

* revert uselesss cahnges

* Update src/transformers/models/led/modeling_led.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

737bff6a

[Whisper Tokenizer] Make more user-friendly (#19921) · 06d48806

Sanchit Gandhi authored Nov 03, 2022



* [Whisper Tokenizer] Make more user-friendly

* use property

* make indexing rigorous

* small clean-up

* tests

* skip seq2seq tests

* remove multilingual arg

* reorder args

* collapse to one function
Co-authored-by: ArthurZucker <arthur@huggingface.co>

* option to override attributes
Co-authored-by: ArthurZucker <arthur@huggingface.co>

* add to docs

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make comment more clear
Co-authored-by: sgugger <sylvain@huggingface.co>

* don't add special tokens in get_decoder_prompt_ids

* add test for set_prefix_tokens
Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>

06d48806

[Doctest] Add configuration_camembert.py (#20039) · 790ff254
Saad Mahmud authored Nov 03, 2022
```
* Add example docstring for CamembertConfig

* Add configuration_camembert to documentation_tests
```
790ff254

Fix some doctests after PR 15775 (#20036) · 9ccea7ac

Yih-Dar authored Nov 03, 2022



* Add skip_special_tokens=True in some doctest

* For T5

* Fix for speech_to_text.mdx
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9ccea7ac

Add **kwargs (#20037) · a639ea9e
amyeroberts authored Nov 03, 2022

a639ea9e
Now supporting pathlike in pipelines too. (#20030) · ec6878f6
Nicolas Patry authored Nov 03, 2022

ec6878f6

02 Nov, 2022 7 commits

🚨

Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2

Ben Eyal authored Nov 02, 2022

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)

* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped

9f9ddcc2

[Doctest] Add configuration_deberta_v2.py (#19995) · 74877437

Saad Mahmud authored Nov 02, 2022

* Add example docstring for DebertaV2Config

* Add DebertaV2Config to documentation_tests

* Fix mistake with directory name

74877437

Quality (#20002) · 49b77b89
Sylvain Gugger authored Nov 02, 2022

49b77b89

Add Image Processors (#19796) · a6b77598

amyeroberts authored Nov 02, 2022



* Add CLIP image processor

* Crop size as dict too

* Update warning

* Actually use logger this time

* Normalize doesn't change dtype of input

* Add perceiver image processor

* Tidy up

* Add DPT image processor

* Add Vilt image processor

* Tidy up

* Add poolformer image processor

* Tidy up

* Add LayoutLM v2 and v3 imsge processors

* Tidy up

* Add Flava image processor

* Tidy up

* Add deit image processor

* Tidy up

* Add ConvNext image processor

* Tidy up

* Add levit image processor

* Add segformer image processor

* Add in post processing

* Fix up

* Add ImageGPT image processor

* Fixup

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Add VideoMAE image processor

* Tidy up

* Add ImageGPT image processor

* Fixup

* Add ViT image processor

* Tidy up

* Add beit image processor

* Add mobilevit image processor

* Tidy up

* Add postprocessing

* Fixup

* Fix up

* Fix flava and remove tree module

* Fix image classification pipeline failing tests

* Update feature extractor in trainer scripts

* Update pad_if_smaller to accept tuple and int size

* Update for image segmentation pipeline

* Update src/transformers/models/perceiver/image_processing_perceiver.py
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>

* Update src/transformers/image_processing_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/beit/image_processing_beit.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PR comments - docstrings; remove accidentally added resize; var names

* Update docstrings

* Add exception if size is not in the right format

* Fix exception check

* Fix up

* Use shortest_edge in tuple in script
Co-authored-by: Alara Dirik <8944735+alaradirik@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

a6b77598

make sentencepiece import conditional in bertjapanesetokenizer (#20012) · 2e3452af
Ripose authored Nov 02, 2022

2e3452af

clean up vision/text config dict arguments (#19954) · 8827e1b2

Yih-Dar authored Nov 02, 2022



* clean up

* For backward compatibility

* clean up

* Same changes for more models
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

8827e1b2

Update object detection pipeline to use post_process_object_detection methods(#20004) · cb630ffa
Alara Dirik authored Nov 02, 2022

cb630ffa

01 Nov, 2022 5 commits

Generate: contrastive search with full optional outputs (#19963) · 831590f6

Joao Gante authored Nov 01, 2022

* Use beam search functionality; Add extra outputs and test

* Add full tests for contrastive search

* Add error message on unconventional cache format

831590f6

Add ESMFold code sample (#20000) · 4f1e5e4e

Matt authored Nov 01, 2022

* Add ESMFold code sample

* sorry sylvain

* make fixup

* sorry sylvain again

4f1e5e4e

typo (#20001) · 4f90fc1d
Wang Ran (汪然) authored Nov 01, 2022

4f90fc1d

Added onnx config whisper (#19525) · c796b6de

Mohit Sharma authored Nov 01, 2022

* Added onnx config whisper

* added whisper support onnx

* add audio input data

* added whisper support onnx

* fixed the seqlength value

* Updated the whisper onnx ocnfig

* restore files to old version

* removed attention mask from inputs

* Updated get_dummy_input_onnxruntime docstring

* Updated relative imports and token generation

* update docstring

c796b6de

v4.25.0.dev0 · c3a93d8d
Sylvain Gugger authored Oct 31, 2022

c3a93d8d