Commits · 6156bffa2bfcfc0094b061bffcfdf4f1ef9d3e28 · chenpangpang / transformers

07 Nov, 2022 11 commits
- Replace awkward timm link with the expected one (#20109) · 6156bffa
  Tom Aarsen authored Nov 07, 2022
  
  6156bffa
- Add new terms to the glossary (#20051) · 71f772eb
  Steven Liu authored Nov 07, 2022
```
* add new terms

* apply review
```
  71f772eb
- docs: Fixed variables in f-strings (#20087) · d44ac47b
  Tom Aarsen authored Nov 07, 2022
```
* docs: Fixed variables in f-strings

* Replace unknown `block` with known `block_type` in ValueError
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add missing torch import in docs code block
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
```
  d44ac47b
- Fix `generate_dummy_inputs` for `ImageGPTOnnxConfig` (#20103) · 2bdd9fa2
  Yih-Dar authored Nov 07, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  2bdd9fa2
- use huggingface_hub.model_inifo() to get pipline_tag (#20077) · cfaeb153
  TAGAMI Yukihiro authored Nov 08, 2022
  
  cfaeb153
- docs: Resolve many typos in the English docs (#20088) · 3222fc64
  Tom Aarsen authored Nov 07, 2022
```
* docs: Fix typo in ONNX parser help: 'tolerence' => 'tolerance'

* docs: Resolve many typos in the English docs

Typos found via 'codespell ./docs/source/en'
```
  3222fc64
- Replace unsupported facebookresearch/bitsandbytes (#20093) · b8112edd
  Tom Aarsen authored Nov 07, 2022
```
With https://github.com/TimDettmers/bitsandbytes, which is by the same author and is still being updated
```
  b8112edd
- Skip 2 tests in `VisionTextDualEncoderProcessorTest` (#20098) · 4ab6e9e2
  Yih-Dar authored Nov 07, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  4ab6e9e2
- Removing RobertaConfig inheritance from CamembertConfig (#20059) · b77406bc
  Saad Mahmud authored Nov 07, 2022
```
* swap RobertaConfig with PretrainedConfig

* Add camembert specific attributes

* Add PretrainedConfig docstring

* Add arguments docstring

* Change CamembertConfig docstring definition

* Fix typo CamembertConfig -> CamembertModel

* Fix typo BertModel -> CamembertModel

* Fix style of CamembertConfig
```
  b77406bc
- [Doctest] Add configuration_dpr.py (#20080) · 9617b130
  Saad Mahmud authored Nov 07, 2022
```
* Add example docstring for DPRConfig

* Add DPRConfig to documentation_tests
```
  9617b130
- Generate: TF contrastive search with XLA support (#20050) · a0f86743
  Joao Gante authored Nov 07, 2022
```
* Add contrastive search
```
  a0f86743
04 Nov, 2022 14 commits
- Update hub.py (#20075) · 504db92e
  Christopher Akiki authored Nov 04, 2022
  
  504db92e
- Update modeling_tf_utils.py (#20076) · 4b86e446
  Christopher Akiki authored Nov 04, 2022
  
  4b86e446
- Update defaults and logic to match old FE (#20065) · d68c4602
  amyeroberts authored Nov 04, 2022
```
* Update defaults and logic to match old FE

* Use docker run rest values
```
  d68c4602
- Show installed libraries and their versions in GA jobs (#20069) · c06d5556
  Yih-Dar authored Nov 04, 2022
```
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  c06d5556
- Allow passing arguments to model testers for CLIP-like models (#20044) · 2d02178e
  Yih-Dar authored Nov 04, 2022
```
* POC

* For more CLIP-like models
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  2d02178e
- Update documentation on seq2seq models with absolute positional embeddings, to... · 3bd0007e
  Jordan Clive authored Nov 04, 2022
```
Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068)
Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
```
  3bd0007e
- Update READMEs for ESMFold and add notebooks (#20067) · 6e1c5786
  Matt authored Nov 04, 2022
```
* Update READMEs for ESMFold and add notebooks

* Fix PyCharm formatting

* make fix-copies
```
  6e1c5786
- change constant torch.tensor to torch.full (#20061) · 707b12a3
  H. Jhoo authored Nov 04, 2022
  
  707b12a3
- [Swin] Add Swin SimMIM checkpoints (#20034) · 787620e2
  NielsRogge authored Nov 04, 2022
```
* Fix Swin

* Remove file

* Update code snippet

* Add copied from to maskformer

* Fix docstring

* Add whole name to replace
Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
```
  787620e2
- PoolformerImageProcessor defaults to match previous FE (#20048) · 3936411b
  amyeroberts authored Nov 04, 2022
```
* Poolformer image processor defaults to previous FE

* Remove unnecessary math.floor
```
  3936411b
- [Trainer] Fix model name in push_to_hub (#20064) · 94e17c45
  Sanchit Gandhi authored Nov 04, 2022
  
  94e17c45
- fix `tokenizer_type` to avoid error when loading checkpoint back (#20062) · 19067711
  Sourab Mangrulkar authored Nov 04, 2022
  
  19067711
- Update README.md (#20063) · 3502c202
  bhuang authored Nov 04, 2022
  
  3502c202
- Fix ESM LM head test (#20045) · 1076d587
  Matt authored Nov 04, 2022
```
* Fix esm lm head test

* make fixup
```
  1076d587
03 Nov, 2022 11 commits

Speed up TF token classification postprocessing by converting complete tensors to numpy (#19976) · d447c460

Patrick Deutschmann authored Nov 03, 2022



* Speed up TF postprocessing by converting to numpy before

* Fix bug that was triggered when offset_mapping was None
Co-authored-by: Patrick Deutschmann <patrick.deutschmann@dedalus.com>

d447c460

Only resize embeddings when necessary (#20043) · 06886d5a
Sylvain Gugger authored Nov 03, 2022
```
* Only resize embeddings when necessary

* Add comment
```
06886d5a
Fixed torch.finfo issue with torch.fx (#20040) · 9080607b
Michael Benayoun authored Nov 03, 2022

9080607b

Update esmfold conversion script (#20028) · 6f257bb3

Matt authored Nov 03, 2022

* Update ESM conversion script for ESMfold

* Fix bug in ESMFold example

* make fixup and move restypes to one line

6f257bb3

fix jit trace error for model forward sequence is not aligned with jit.trace... · 2564f0c2

Wang, Yi authored Nov 03, 2022


fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc (#19891)

* fix jit trace error for classification usecase, update related doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add implementation in torch 1.14.0
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update_doc
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

2564f0c2

[FuturWarning] Add futur warning for LEDForSequenceClassification (#19066) · 737bff6a

Arthur authored Nov 03, 2022



* fix led eos_mask

* add Futur Warning

* revert uselesss cahnges

* Update src/transformers/models/led/modeling_led.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

737bff6a

[Whisper Tokenizer] Make more user-friendly (#19921) · 06d48806

Sanchit Gandhi authored Nov 03, 2022



* [Whisper Tokenizer] Make more user-friendly

* use property

* make indexing rigorous

* small clean-up

* tests

* skip seq2seq tests

* remove multilingual arg

* reorder args

* collapse to one function
Co-authored-by: ArthurZucker <arthur@huggingface.co>

* option to override attributes
Co-authored-by: ArthurZucker <arthur@huggingface.co>

* add to docs

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make comment more clear
Co-authored-by: sgugger <sylvain@huggingface.co>

* don't add special tokens in get_decoder_prompt_ids

* add test for set_prefix_tokens
Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>

06d48806

[Doctest] Add configuration_camembert.py (#20039) · 790ff254
Saad Mahmud authored Nov 03, 2022
```
* Add example docstring for CamembertConfig

* Add configuration_camembert to documentation_tests
```
790ff254

Fix some doctests after PR 15775 (#20036) · 9ccea7ac

Yih-Dar authored Nov 03, 2022



* Add skip_special_tokens=True in some doctest

* For T5

* Fix for speech_to_text.mdx
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

9ccea7ac

Add **kwargs (#20037) · a639ea9e
amyeroberts authored Nov 03, 2022

a639ea9e
Now supporting pathlike in pipelines too. (#20030) · ec6878f6
Nicolas Patry authored Nov 03, 2022

ec6878f6

02 Nov, 2022 4 commits

reorganize glossary (#20010) · aa39967b
Steven Liu authored Nov 02, 2022

aa39967b

Show installed libraries and their versions in CI jobs (#20026) · 305e8718

Yih-Dar authored Nov 02, 2022



* Show versions

* check

* store outputs

* revert
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

305e8718

🚨

Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in... · 9f9ddcc2

Ben Eyal authored Nov 02, 2022

🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in `convert_tokens_to_string` (#15775)

* Add test for SentencePiece not adding special tokens to strings

* Add SentencePieceStringConversionMixin to fix issue 15003

* Fix conversion from tokens to string for most SentencePiece tokenizers

Tokenizers fixed:
- AlbertTokenizer
- BarthezTokenizer
- CamembertTokenizer
- FNetTokenizer
- M2M100Tokenizer
- MBart50Tokenizer
- PegasusTokenizer
- Speech2TextTokenizer

* Fix MarianTokenizer, adjust SentencePiece test to accomodate vocab

* Fix DebertaV2Tokenizer

* Ignore LayoutXLMTokenizer in SentencePiece string conversion test

* Run 'make style' and 'make quality'

* Clean convert_tokens_to_string test

Instead of explicitly ignoring LayoutXLMTokenizer in the test,
override the test in LayoutLMTokenizationTest and do nothing in it.

* Remove commented out code

* Improve robustness of convert_tokens_to_string test

Instead of comparing lengths of re-tokenized text and input_ids,
check that converting all special tokens to string yields a string
with all special tokens.

* Inline and remove SentencePieceStringConversionMixin

The convert_tokens_to_string method is now implemented
in each relevant SentencePiece tokenizer.

* Run 'make style' and 'make quality'

* Revert removal of space in convert_tokens_to_string

* Remove redundant import

* Revert test text to original

* Uncomment the lowercasing of the reverse_text variable

* Mimic Rust tokenizer behavior for tokenizers

- Albert
- Barthez
- Camembert
- MBart50
- T5

* Fix accidentally skipping test in wrong tokenizer

* Add test for equivalent Rust and slow tokenizer behavior

* Override _decode in BigBirdTokenizer to mimic Rust behavior

* Override _decode in FNetTokenizer to mimic Rust behavior

* Override _decode in XLNetTokenizer to mimic Rust behavior

* Remove unused 're' import

* Update DebertaV2Tokenizer to mimic Rust tokenizer

* Deberta tokenizer now behaves like Albert and its `convert_tokens_to_string` is not tested.

* Ignore problematic tests in Deberta V2

* Add comment on why the Deberta V2 tests are skipped

9f9ddcc2

Fix doctest (#20023) · fb7cbe23

Yih-Dar authored Nov 02, 2022



* Fix doctest
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

fb7cbe23