Commits · b7bb2b59f72504fbabe3de24c84b5e282c4870e8 · chenpangpang / transformers

05 Oct, 2022 1 commit

Add WhisperModel to transformers (#19166) · 45e14038

Arthur authored Oct 05, 2022



* simplify loop

* add featur extractor

* add model

* start conversion

* add dropout

* initial commit of test files

* copnversion for all models

* update processor for correct padding

* update feature extraction

* update integration test logits match

* fmnt: off for the logits

* on the fly mel bank

* small nit

* update test

* update tokenizer

* nit feature extraction

* update

* update tokenizer test

* adds logit processor and update tokenizer to get supress tokens

* style

* clean convert

* revert to original modeling tf utils

* Update

* update

* nit

* clean convert file

* update tests and nits

* quality

* slow generation test

* ffn_dim to allow customization

* update readme

* add to toctreee

* start fixing integration tests

* update tests and code

* fix feature extractor

* fix config tests common

* update code to fix tests

* fix feature exctractor

* nit feature extraction

* update test for new feature extractor

* style

* add absrtact

* large logits wioth custom decoder input ids

* wraap around is otrch available

* fix feature extractor

* correct logits for whisper small.en

* nit

* fix encoder_attentino_mask

* some fixes

* remove unnecessary inputs

* nits

* add normalizer file

* update etst tokenization

* fix attention mask not defined

* Add model to README

* Fix doc tests

* fix generate

* remove uncoder attention mask useless

* update test modeling whisper

* update condfig to add second non supress tokens

* nits on feature exrtactor

* nit for test tokenizers

* update etsts

* update tests

* update tokenization test

* fixup

* invalidated hf token. Clean convert openai to whisper

* fix logit tests

* fixup

* clean merge

* revert toc_tree changes

* remove useless LogitProcessor

* Update whisper .mdx

* update config file doc

* update configuration docstring

* update test tokenization

* update test tokenization

* update tokenization whisper
Added copied from where needed

* update feature extraction

* nit test name

* style

* quality

* remove get suppress tokens and update non_speech tokens global variables

* Update src/transformers/models/whisper/feature_extraction_whisper.py
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* clean modeling whisper and test
Removed the attention mask arguments that are deprecated

* fix large test

* Add multilingual audio test, and translate test

* style

* fix larg multilingual test

* nits

* Update docs/source/en/model_doc/whisper.mdx
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* add copied from for attention layer

* remove attention masks in doc

* add english normalizer

* update tokenization test

* remove copied from in whisper attention : no bias in k_proj only

* wrap around dependencies in english normalizer

* style

* correct import generation logits

* for now, wrap feature extractor with torch

* Update src/transformers/models/whisper/convert_openai_whisper_to_tfms.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update docs/source/en/model_doc/whisper.mdx
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* remove torch depencies for feature extraction and style

* fixup

* nit

* update logitds

* style

* nit

* nits and fix final tests

* add `is_more_itertools_available` to utils

* quality

* add begin supress tokens, supress tokens to generate args and config

* clean supressTokensLogitProcessor in generation logits

* Nit naming

* add supressTokensAtBegin

* udpate tests, supress tokens to None or correct values

* nit and style

* update RAG to fit test and generate_logit

* add copy pasted statment on english normalizer

* add arguments to config_common_kwargs

* Update src/transformers/generation_utils.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/generation_logits_process.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* revert changes based on reviews

* update doc and nits

* more nits

* last nits

* update test configuration common

* add BART name in decoder attention mask documentation

* Update src/transformers/models/whisper/modeling_whisper.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* style

* nit

* nit

* add english.json file to git

* nits on documentation

* nit

* nits

* last styling

* add main toctree file

* remove sentence piece dependency

* clean init file

* fix tokenizer that has no dependencies on sentencepiece

* update whisper init file, nit

* remove english.json file

* add get decoder prompt id

* revert changes and add forced logit processor

* nit

* clean normalizer

* remove protected

* update

* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* update based on review

* Update src/transformers/models/whisper/configuration_whisper.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* add batched tests
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: NielsRogge <niels.rogge1@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

45e14038

16 Sep, 2022 1 commit
- Add tests for legacy load by url and fix bugs (#19078) · ca485e56
  Sylvain Gugger authored Sep 16, 2022
  
  ca485e56
29 Aug, 2022 1 commit
- Fix mock in `test_cached_files_are_used_when_internet_is_down` (#18804) · 169b8cde
  Lucain authored Aug 29, 2022
  
  169b8cde
10 Aug, 2022 1 commit

Use commit hash to look in cache instead of calling head (#18534) · 0d0aada5

Sylvain Gugger authored Aug 10, 2022



* Use commit hash to look in cache instead of calling head

* Add tests

* Add attr for local configs too

* Stupid typos

* Fix tests

* Update src/transformers/utils/hub.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address Julien's comments
Co-authored-by: Julien Chaumond <julien@huggingface.co>

0d0aada5

05 Aug, 2022 1 commit

Use new huggingface_hub tools for download models (#18438) · 5cd40323

Sylvain Gugger authored Aug 05, 2022

* Draft new cached_file

* Initial draft for config and model

* Small fixes

* Fix first batch of tests

* Look in cache when internet is down

* Fix last tests

* Bad black, not fixing all quality errors

* Make diff less

* Implement change for TF and Flax models

* Add tokenizer and feature extractor

* For compatibility with main

* Add utils to move the cache and auto-do it at first use.

* Quality

* Deal with empty commit shas

* Deal with empty etag

* Address review comments

5cd40323

01 Aug, 2022 1 commit

Rewrite push_to_hub to use upload_files (#18366) · 01db72ab

Sylvain Gugger authored Aug 01, 2022

* Rewrite push_to_hub to use upload_files

* Adapt the doc a bit

* Address review comments and clean doc

01db72ab

19 Jul, 2022 1 commit

[From pretrained] Allow download from subfolder inside model repo (#18184) · 3bb6356d

Patrick von Platen authored Jul 19, 2022



* add first generation tutorial

* [from_pretrained] Allow loading models from subfolders

* remove gen file

* add doc strings

* allow download from subfolder

* add tests

* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* apply comments

* correct doc string
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

3bb6356d

01 Jul, 2022 1 commit

XLA train step fixes (#17973) · d6cec458

Matt authored Jul 01, 2022

* Copy inputs to train and test step before modifying them, as this breaks things

* Add XLA tests, fix our loss functions to be XLA-compatible

* make fixup

* Update loss computation test to expect vector of per-sample losses

* Patch loss for TFLED

* Patch loss for TFAlbert

* Add a tf_legacy_loss config flag that enables old loss functions

* Stop using config.get() because it's not a dict

* Skip loss computation test for RAG because its loss is very strange and I'm afraid to rewrite it

* make fixup

* Add XLA-compatible RAG loss

* Fix dtype of loss mask for TFAlbert

* Fix test for XLNet too because it overrides the default one

* make fixup

* Fix config test

* No more depending on GPU NaN behaviour

* Add test, avoid potential zero division

* Fix test item assignment

* Fix loss computation masking test

* make fixup

* Fix dtype bugs

d6cec458

21 Jun, 2022 1 commit

Prepare transformers for v0.8.0 huggingface-hub release (#17716) · 6a5272b2

Lysandre Debut authored Jun 21, 2022



* Prepare CI for v0.8.0

* pin hfh (revert before merge)

* Revert "pin hfh (revert before merge)"

This reverts commit a0103140e1c77b810ffcb735192968bc03be3e1f.

* Test rc3

* Test latest rc

* Unpin to the RC
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>

6a5272b2

12 May, 2022 1 commit

Black preview (#17217) · afe5d42d

Sylvain Gugger authored May 12, 2022

* Black preview

* Fixup too!

* Fix check copies

* Use the same version as the CI

* Bump black

afe5d42d

04 May, 2022 1 commit
- Make sure telemetry arguments are not returned as unused kwargs (#17063) · d76d2a2a
  Sylvain Gugger authored May 04, 2022
```
* Make sure telemetry arguments are not returned as unused kwargs

* Fix test
```
  d76d2a2a
23 Mar, 2022 1 commit

Make Transformers use cache files when hf.co is down (#16362) · c595b6e6

Sylvain Gugger authored Mar 23, 2022

* Make Transformers use cache files when hf.co is down

* Fix tests

* Was there a random circleCI failure?

* Isolate patches

* Style

* Comment out the failure since it doesn't fail anymore

* Better comment

c595b6e6

11 Mar, 2022 1 commit

Add soft length regulation for sequence generation (#15245) · 9442b3ce

Kevin Bondzio authored Mar 11, 2022



* add possibility to softly regulate length when using sampling method in model.generate() function

* fix test config, fix formatting

* fix rag integration, fix docstyling

* fix wrong docstring

* change param to tuple, add test

* fix old param in rag_model, remove unused import

* change test according to new param

* fix formatting

* fix test case

* fix doc style

* move start_length calculation to Logitprocessor

* add possibility to softly regulate length when using sampling method in model.generate() function

* fix rag integration, fix docstyling

* fix test config, fix formatting

* change param to tuple, add test

* fix old param in rag_model, remove unused import

* add possibility to softly regulate length when using sampling method in model.generate() function

* change param to tuple, add test

* fix old param in rag_model, remove unused import

* remove unused import

* fix small errors

* fix test

* add possibility to softly regulate length when using sampling method in model.generate() function

* fix test config, fix formatting

* fix rag integration, fix docstyling

* change param to tuple, add test

* fix old param in rag_model, remove unused import

* change test according to new param

* fix test case

* move start_length calculation to Logitprocessor

* add possibility to softly regulate length when using sampling method in model.generate() function

* fix rag integration, fix docstyling

* fix test config, fix formatting

* change param to tuple, add test

* fix old param in rag_model, remove unused import

* add possibility to softly regulate length when using sampling method in model.generate() function

* fix test config, fix formatting

* fix rag integration, fix docstyling

* add possibility to softly regulate length when using sampling method in model.generate() function

* fix rag integration, fix docstyling

* change param to tuple, add test

* fix old param in rag_model, remove unused import

* fix small errors

* Update src/transformers/generation_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/generation_utils.py

* Update src/transformers/generation_utils.py

* fix docstring, add type ind model rag

* fix docstrings

* introduce seq_length variable for cleaner code

* fix black formatting

* add input_ids_seq_length to modeling_rag

* add input_ids_seq_length to test

* retrigger checks

* retrigger checks
Co-authored-by: Kevin Bondzio <kev@AIM-LAP-02.local>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Kevin Bondzio <kev@AIM-LAP-02.fritz.box>

9442b3ce

11 Feb, 2022 1 commit
- Fix _configuration_file argument getting passed to model (#15629) · 2dce350b
  Sylvain Gugger authored Feb 11, 2022
  
  2dce350b
09 Feb, 2022 1 commit

Add implementation of typical sampling (#15504) · 0113aae5

Clara Meister authored Feb 09, 2022

* typical decoding

* changing arg name

* add test config params

* forgotten arg rename

* fix edge case where scores are same

* test for typical logits warper

* code quality fixes

0113aae5

02 Feb, 2022 1 commit

Save code of registered custom models (#15379) · 44b21f11

Sylvain Gugger authored Feb 02, 2022



* Allow dynamic modules to use relative imports

* Work for configs

* Fix last merge conflict

* Save code of registered custom objects

* Map strings to strings

* Fix test

* Add tokenizer

* Rework tests

* Tests

* Ignore fixtures py files for tests

* Tokenizer test + fix collection

* With full path

* Rework integration

* Fix typo

* Remove changes in conftest

* Test for tokenizers

* Add documentation

* Update docs/source/custom_models.mdx
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Add file structure and file content

* Add more doc

* Style

* Update docs/source/custom_models.mdx
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Address review comments
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

44b21f11

25 Jan, 2022 1 commit

Avoid using get_list_of_files (#15287) · e6954707

Sylvain Gugger authored Jan 25, 2022

* Avoid using get_list_of_files in config

* Wip, change tokenizer file getter

* Remove call in tokenizer files

* Remove last call to get_list_model_files

* Better tests

* Unit tests for new function

* Document bad API

e6954707

14 Jan, 2022 1 commit
- Update test_configuration_common.py (#15160) · 735d2bb6
  novice authored Jan 14, 2022
  
  735d2bb6
15 Nov, 2021 1 commit

Allow per-version configurations (#14344) · 1cc453d3

Lysandre Debut authored Nov 15, 2021



* Allow per-version configurations

* Update tests/test_configuration_common.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_configuration_common.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

1cc453d3

08 Nov, 2021 1 commit
- Expand dynamic supported objects to configs and tokenizers (#14296) · dfb00bf6
  Sylvain Gugger authored Nov 08, 2021
```
* Dynamic configs

* Add config test

* Better tests

* Add tokenizer and test

* Add to from_config

* With save
```
  dfb00bf6
02 Nov, 2021 1 commit
- Update Transformers to huggingface_hub >= 0.1.0 (#14251) · 558f8543
  Sylvain Gugger authored Nov 02, 2021
```
* Update Transformers to huggingface_hub >= 0.1.0

* Forgot to save...

* Style

* Fix test
```
  558f8543
14 Oct, 2021 1 commit

Add strong test for configuration attributes (#14000) · f2002fea

Sylvain Gugger authored Oct 14, 2021

* Add strong test for configuration attributes

* Add fake modif to trigger all tests

* Add a better fake modif

* Ignore is_encoder_decoder

* Fix faulty configs

* Remove fake modif

f2002fea

06 Sep, 2021 1 commit

Update model configs - Allow setters for common properties (#13026) · c8be8a9a

Nils Reimers authored Sep 06, 2021

* refactor GPT Config to allow dyn. properties

* make attribute_map a class attribute

* remove old code

* update unit test to test config: Add test for common properties setter

* update unit test to test config: Add test for common properties passed as parameters to __init__

* update to black code format

* Allow that setters are not defined for certain config classes

* update config classes to implement attribute_map

* bugfix lxmert config - id2labels was not defined when num_labels was set

* update broken configs - add attribute_maps

* update bart config

* update black codestyle

* update documentation on common config attributes

* update GPTJ config to new attribute map

* update docs on common attributes

* gptj config: add max_position_embeddings

* gptj config: format with black

* update speech to text 2 config

* format doc file to max_len 119

* update config template

c8be8a9a

23 Jun, 2021 1 commit

Clean push to hub API (#12187) · 53c60bab

Sylvain Gugger authored Jun 23, 2021



* Clean push to hub API

* Create working dir if it does not exist

* Different tweak

* New API + all models + test Flax

* Adds the Trainer clean up

* Update src/transformers/file_utils.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Address review comments

* (nit) output types

* No need to set clone_from when folder exists

* Update src/transformers/trainer.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Add generated_from_trainer tag

* Update to new version

* Fixes
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>

53c60bab

25 May, 2021 1 commit

[Examples] create model with custom config on the fly (#11798) · 1b653010

Stas Bekman authored May 25, 2021



* create custom model on the flight

* better wording

* add update_from_string

* cleanup

* cleanup

* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* more bool options

* style

* fix logger

* add test

* add the doc

* assert on conflict of options
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

1b653010

26 Apr, 2021 1 commit
- Give each test a different repo name (#11453) · 7959d835
  Sylvain Gugger authored Apr 26, 2021
  
  7959d835
23 Apr, 2021 1 commit

Trainer push to hub (#11328) · bf2e0cf7

Sylvain Gugger authored Apr 23, 2021



* Initial support for upload to hub

* push -> upload

* Fixes + examples

* Fix torchhub test

* Torchhub test I hate you

* push_model_to_hub -> push_to_hub

* Apply mixin to other pretrained models

* Remove ABC inheritance

* Add tests

* Typo

* Run tests

* Install git-lfs

* Change approach

* Add push_to_hub to all

* Staging test suite

* Typo

* Maybe like this?

* More deps

* Cache

* Adapt name

* Quality

* MOAR tests

* Put it in testing_utils

* Docs + torchhub last hope

* Styling

* Wrong method

* Typos

* Update src/transformers/file_utils.py
Co-authored-by: Julien Chaumond <julien@huggingface.co>

* Address review comments

* Apply suggestions from code review
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Chaumond <julien@huggingface.co>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

bf2e0cf7

12 Apr, 2021 1 commit

Add DeiT (PyTorch) (#11056) · 9f126097

NielsRogge authored Apr 13, 2021

* First draft of deit

* More improvements

* Remove DeiTTokenizerFast from init

* Conversion script works

* Add DeiT to ViT conversion script

* Add tests, add head model, add support for deit in vit conversion script

* Update model checkpoint names

* Update image_mean and image_std, set resample to bicubic

* Improve docs

* Docs improvements

* Add DeiTForImageClassificationWithTeacher to init

* Address comments by @sgugger

* Improve feature extractors

* Make fix-copies

* Minor fixes

* Address comments by @patil-suraj

* All models uploaded

* Fix tests

* Remove labels argument from DeiTForImageClassificationWithTeacher

* Fix-copies, style and quality

* Fix tests

* Fix typo

* Multiple docs improvements

* More docs fixes

9f126097

22 Oct, 2020 1 commit

[PretrainedConfig] Fix save pretrained config for edge case (#7943) · f34372a9

Patrick von Platen authored Oct 22, 2020



* fix config save

* add test

* add config class variable and another test

* line break

* fix fsmt and typo

* god am I making many errors today :-/

* Update src/transformers/configuration_utils.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

f34372a9

05 Mar, 2020 1 commit
- Pass kwargs to configuration (#3147) · b623ddc0
  Lysandre Debut authored Mar 05, 2020
```
* Pass kwargs to configuration

* Setter

* test
```
  b623ddc0
06 Jan, 2020 2 commits
- GPU text generation: mMoved the encoded_prompt to correct device · 81d6841b
  alberduris authored Dec 31, 2019
  
  81d6841b
- Moved the encoded_prompts to correct device · dd4df80f
  alberduris authored Dec 31, 2019
  
  dd4df80f
22 Dec, 2019 8 commits
- Remove sys.version_info[0] == 2 or 3. · 798b3b38
  Aymeric Augustin authored Dec 22, 2019
  
  798b3b38
- Remove __future__ imports. · c824d15a
  Aymeric Augustin authored Dec 22, 2019
  
  c824d15a
- Replace CommonTestCases for tokenizers with a mixin. · 00204f2b
  Aymeric Augustin authored Dec 22, 2019
```
This is the same change as for (TF)CommonTestCases for modeling.
```
  00204f2b
- Rename file for consistency. · a3c5883f
  Aymeric Augustin authored Dec 22, 2019
  
  a3c5883f
- Remove unittest.main() in test modules. · 7e98e211
  Aymeric Augustin authored Dec 22, 2019
```
This construct isn't used anymore these days.

Running python tests/test_foo.py puts the tests/ directory on
PYTHONPATH, which isn't representative of how we run tests.

Use python -m unittest tests/test_foo.py instead.
```
  7e98e211
- Switch test files to the standard test_*.py scheme. · ced0a942
  Aymeric Augustin authored Dec 22, 2019
  
  ced0a942
- Move tests outside of library. · 067395d5
  Aymeric Augustin authored Dec 22, 2019
  
  067395d5
- Fix F401 flake8 warning (x152 / 268). · 80327a13
  Aymeric Augustin authored Dec 21, 2019
```
This change is mostly autogenerated with:

    $ python -m autoflake --in-place --recursive examples templates transformers utils hubconf.py setup.py

I made minor changes in the generated diff.
```
  80327a13