Commits · 9234caefb0241939f7b2b0ee3d73ed5ebf842ae9 · chenpangpang / transformers

31 Oct, 2023 1 commit

[docstring] Fix docstring for AltCLIPTextConfig, AltCLIPVisionConfig and AltCLIPConfig (#27128) · 9234caef

Akshar Goyal authored Oct 31, 2023

* [docstring] Fix docstring for AltCLIPVisionConfig, AltCLIPTextConfig + cleaned some docstring

* Removed entries from check_docstring.py

* Removed entries from check_docstring.py

* Removed entry from check_docstring.py

* [docstring] Fix docstring for AltCLIPTextConfig, AltCLIPVisionConfig and AltCLIPConfig

9234caef

25 Oct, 2023 1 commit

[`docs`] Add `MaskGenerationPipeline` in docs (#27063) · c34c50cd

Younes Belkada authored Oct 25, 2023

* add `MaskGenerationPipeline` in docs

* Update __init__.py

* fix repo consistency and clarify docstring

* add on check docstirngs

* actually we do have a tf sam

* oops

c34c50cd

23 Oct, 2023 1 commit

Add Seamless M4T model (#25693) · cb45f71c

Yoach Lacombe authored Oct 23, 2023



* first raw commit

* still POC

* tentative convert script

* almost working speech encoder conversion scripts

* intermediate code for encoder/decoders

* add modeling code

* first version of speech encoder

* make style

* add new adapter layer architecture

* add adapter block

* add first tentative config

* add working speech encoder conversion

* base model convert works now

* make style

* remove unnecessary classes

* remove unecessary functions

* add modeling code speech encoder

* rework logics

* forward pass of sub components work

* add modeling codes

* some config modifs and modeling code modifs

* save WIP

* new edits

* same output speech encoder

* correct attention mask

* correct attention mask

* fix generation

* new generation logics

* erase comments

* make style

* fix typo

* add some descriptions

* new state

* clean imports

* add tests

* make style

* make beam search and num_return_sequences>1 works

* correct edge case issue

* correct SeamlessM4TConformerSamePadLayer copied from

* replace ACT2FN relu by nn.relu

* remove unecessary return variable

* move back a class

* change name conformer_attention_mask ->conv_attention_mask

* better nit code

* add some Copied from statements

* small nits

* small nit in dict.get

* rename t2u model -> conditionalgeneration

* ongoing refactoring of structure

* update models architecture

* remove SeamlessM4TMultiModal classes

* add tests

* adapt tests

* some non-working code for vocoder

* add seamlessM4T vocoder

* remove buggy line

* fix some hifigan related bugs

* remove hifigan specifc config

* change

* add WIP tokenization

* add seamlessM4T working tokenzier

* update tokenization

* add tentative feature extractor

* Update converting script

* update working FE

* refactor input_values -> input_features

* update FE

* changes in generation, tokenizer and modeling

* make style and add t2u_decoder_input_ids

* add intermediate outputs for ToSpeech models

* add vocoder to speech models

* update valueerror

* update FE with languages

* add vocoder convert

* update config docstrings and names

* update generation code and configuration

* remove todos and update config.pad_token_id to generation_config.pad_token_id

* move block vocoder

* remove unecessary code and uniformize tospeech code

* add feature extractor import

* make style and fix some copies from

* correct consistency + make fix-copies

* add processor code

* remove comments

* add fast tokenizer support

* correct pad_token_id in M4TModel

* correct config

* update tests and codes  + make style

* make some suggested correstion - correct comments and change naming

* rename some attributes

* rename some attributes

* remove unecessary sequential

* remove option to use dur predictor

* nit

* refactor hifigan

* replace normalize_mean and normalize_var with do_normalize + save lang ids to generation config

* add tests

* change tgt_lang logic

* update generation ToSpeech

* add support import SeamlessM4TProcessor

* fix generate

* make tests

* update integration tests, add option to only return text and update tokenizer fast

* fix wrong function call

* update import and convert script

* update integration tests + update repo id

* correct paths and add first test

* update how new attention masks are computed

* update tests

* take first care of batching in vocoder code

* add batching with the vocoder

* add waveform lengths to model outputs

* make style

* add generate kwargs + forward kwargs of M4TModel

* add docstrings forward methods

* reformate docstrings

* add docstrings t2u model

* add another round of modeling docstrings + reformate speaker_id -> spkr_id

* make style

* fix check_repo

* make style

* add seamlessm4t to toctree

* correct check_config_attributes

* write config docstrings + some modifs

* make style

* add docstrings tokenizer

* add docstrings to processor, fe and tokenizers

* make style

* write first version of model docs

* fix FE + correct FE test

* fix tokenizer + add correct integration tests

* fix most tokenization tests

* make style

* correct most processor test

* add generation tests and fix num_return_sequences > 1

* correct integration tests -still one left

* make style

* correct position embedding

* change numbeams to 1

* refactor some modeling code and correct one test

* make style

* correct typo

* refactor intermediate fnn

* refactor feedforward conformer

* make style

* remove comments

* make style

* fix tokenizer tests

* make style

* correct processor tests

* make style

* correct S2TT integration

* Apply suggestions from Sanchit code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct typo

* replace torch.nn->nn + make style

* change Output naming (waveforms -> waveform) and ordering

* nit renaming and formating

* remove return None when not necessary

* refactor SeamlessM4TConformerFeedForward

* nit typo

* remove almost copied from comments

* add a copied from comment and remove an unecessary dropout

* remove inputs_embeds from speechencoder

* remove backward compatibiliy function

* reformate class docstrings for a few components

* remove unecessary methods

* split over 2 lines smthg hard to read

* make style

* replace two steps offset by one step as suggested

* nice typo

* move warnings

* remove useless lines from processor

* make generation non-standard test more robusts

* remove torch.inference_mode from tests

* split integration tests

* enrich md

* rename control_symbol_vocoder_offset->vocoder_offset

* clean convert file

* remove tgt_lang and src_lang from FE

* change generate docstring of ToText models

* update generate docstring of tospeech models

* unify how to deal withtext_decoder_input_ids

* add default spkr_id

* unify tgt_lang for t2u_model

* simplify tgt_lang verification

* remove a todo

* change config docstring

* make style

* simplify t2u_tgt_lang_id

* make style

* enrich/correct comments

* enrich .md

* correct typo in docstrings

* add torchaudio dependency

* update tokenizer

* make style and fix copies

* modify SeamlessM4TConverter with new tokenizer behaviour

* make style

* correct small typo docs

* fix import

* update docs and add requirement to tests

* add convert_fairseq2_to_hf in utils/not_doctested.txt

* update FE

* fix imports and make style

* remove torchaudio in FE test

* add seamless_m4t.md to utils/not_doctested.txt

* nits and change the way docstring dataset is loaded

* move checkpoints from ylacombe/ to facebook/ orga

* refactor warning/error to be in the 119 line width limit

* round overly precised floats

* add stereo audio behaviour

* refactor .md and make style

* enrich docs with more precised architecture description

* readd undocumented models

* make fix-copies

* apply some suggestions

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* correct bug from previous commit

* refactor a parameter allowing to clean the code + some small nits

* clean tokenizer

* make style and fix

* make style

* clean tokenizers arguments

* add precisions for some tests

* move docs from not_tested to slow

* modify tokenizer according to last comments

* add copied from statements in tests

* correct convert script

* correct parameter docstring style

* correct tokenization

* correct multi gpus

* make style

* clean modeling code

* make style

* add copied from statements

* add copied statements

* add support with ASR pipeline

* remove file added inadvertently

* fix docstrings seamlessM4TModel

* add seamlessM4TConfig to OBJECTS_TO_IGNORE due of unconventional markdown

* add seamlessm4t to assisted generation ignored models

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

cb45f71c

20 Oct, 2023 1 commit

[docstring] Fix docstring for speech-to-text config (#26883) · 929134bf

Adam Ross authored Oct 20, 2023

* Fix docstring for speech-to-text config

* Refactor doc line len <= 119 char

* Remove Speech2TextConfig from OBJECTS_TO_IGNORE

* Fix Speech2TextConfig doc str

* Fix Speech2TextConfig doc using doc-builder

* Refactor Speech2TextConfig doc

929134bf

19 Oct, 2023 2 commits

[docstring] Fix docstrings for `CodeGen` (#26821) · ad08137e

Daniil authored Oct 19, 2023



* remove docstrings CodeGen from objects_to_ignore

* autofix codegen docstrings

* fill in the missing types and docstrings

* fixup

* change descriptions to be in a separate line

* apply docstring suggestions from code review
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* update n_ctx description in CodeGenConfig

---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

ad08137e

[docstring] Fix docstring for `ChineseCLIP` (#26880) · 816c2237

Sparty authored Oct 19, 2023



* Remove ChineseCLIPImageProcessor, ChineseCLIPTextConfig, ChineseCLIPVisionConfig from check_docstrings

* Run fix_and_overwrite for ChineseCLIPImageProcessor, ChineseCLIPTextConfig, ChineseCLIPVisionConfig

* Replace <fill_type> and <fill_docstring> in configuration_chinese_clip.py, image_processing_chinese_clip.py with type and docstring values

---------
Co-authored-by: vignesh-raghunathan <vignesh_raghunathan@intuit.com>

816c2237

17 Oct, 2023 1 commit

[docstring] Fix docstring for LukeConfig (#26858) · 51042ae8

louietouie authored Oct 17, 2023



* Deleted LukeConfig and ran check_docstrings.py

* Filled docstring information

---------
Co-authored-by: louie <louisparizeau@Chicken.local>

51042ae8

16 Oct, 2023 3 commits

[docstring] Fix bert generation tokenizer (#26820) · 5c6b83cb

przemL authored Oct 16, 2023

* Remove BertGenerationTokenizer from objects to ignore

The file BertGenerationTokenizer is removed from
objects to ignore as a first step to fix docstring.

* Docstrings fix for BertGenerationTokenizer

Docstring fix is generated for BertGenerationTokenizer
by using check_docstrings.py.

* Fix docstring for BertGenerationTokenizer

Added sep_token type and docstring in BertGenerationTokenizer.

5c6b83cb

[docstring] Fix docstring for `CodeLlamaTokenizerFast` (#26666) · 5c081e29
Bojun-Feng authored Oct 16, 2023
```
* remove from OBJECTS_TO_IGNORE

* run check_docstrings.py

* fill in information

* ignore CodeLlamaTokenizer
```
5c081e29

[docstring] Fix docstring for `CanineConfig` (#26771) · 0e52af4d

Sparty authored Oct 16, 2023



* Remove CanineConfig from check_docstrings

* Run fix_and_overwrite for CanineConfig

* Replace <fill_type> and <fill_docstring> in configuration_canine.py with type and docstring values

---------
Co-authored-by: vignesh-raghunathan <vignesh_raghunathan@intuit.com>

0e52af4d

13 Oct, 2023 3 commits

Add OWLv2, bis (#26668) · 762af3e3

NielsRogge authored Oct 13, 2023

* First draft

* Update conversion script

* Update copied from statements

* Fix style

* Add copied from to config

* Add copied from to processor

* Run make fixup

* Add docstring

* Update docstrings

* Add method

* Improve docstrings

* Fix docstrings

* Improve docstrings

* Remove onnx

* Add flag

* Address comments

* Add copied from to model tests

* Add flag to conversion script

* Add code snippet

* Address more comments

* Address comment

* Improve conversion script

* More improvements

* Add expected objectness logits

* Skip test

* Improve conversion script

* Extend conversion script

* Convert large checkpoint

* Fix doc tests

* Convert all checkpoints, update integration tests

* Add checkpoint_path arg

* Fix repo_id

762af3e3

[docstring] fix docstring `DPRConfig` (#26674) · 5bfda28d

dekomori_sanae09 authored Oct 13, 2023



* fix docstring dpr config

* fix style

* Update descp
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

5bfda28d

[docstring] Fix docstring for `RwkvConfig` (#26782) · d085662c
Bojun-Feng authored Oct 13, 2023
```
* update check_docstrings

* update docstring
```
d085662c

12 Oct, 2023 3 commits

[docstring] Fix docstring for 'BertGenerationConfig' (#26661) · 33df09e7

Adwait authored Oct 12, 2023

* [docstring] Remove 'BertGenerationConfig' from OBJECTS_TO_IGNORE

* [docstring] Fix docstring for 'BertGenerationConfig' (#26638)

33df09e7

[docstring] Update `GPT2` and `Whisper` (#26642) · b4199c2d

Joseph McDonnell authored Oct 12, 2023



* [DOCS] Update docstrings for  and  tokenizer

* [DOCS] add pad_token argument to whisper tokenizer docstring

* [FIX] Reword pad_token description

* [CHORE] Apply style formatting

---------
Co-authored-by: jmcdonnell <jmcdonnell@fieldbox.ai>

b4199c2d

[docstring] Fix `UniSpeech`, `UniSpeechSat`, `Wav2Vec2ForCTC` (#26664) · eb734e51

Gizem authored Oct 12, 2023



* Remove UniSpeechConfig

* Remove , at the end otherwise check_docstring changes order

* Auto add new docstring

* Update docstring for UniSpeechConfig

* Remove from check_docstrings

* Remove UniSpeechSatConfig and UniSpeechSatForCTC from check_docstrings

* Remove , at the end

* Fix docstring

* Update docstring for Wav2Vec2ForCTC

* Update Wav2Vec2ForCTC docstring
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* fix style

---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

eb734e51

11 Oct, 2023 3 commits

[docstring] Fix docstring for `CodeLlamaTokenizer` (#26709) · 797a1bab
Bojun-Feng authored Oct 11, 2023
```
* update check_docstrings

* update docstring
```
797a1bab

[docstring] Fix docstring for `LlamaTokenizer` and `LlamaTokenizerFast` (#26669) · aaccf184

Minho Ryang authored Oct 12, 2023

* [docstring] Fix docstring for `LlamaTokenizer` and `LlamaTokenizerFast`

* [docstring] Fix docstring typo at `LlamaTokenizer` and `LlamaTokenizerFast`

aaccf184

[docstring] `SwinModel` docstring fix (#26679) · cc44ca80

Shivanand authored Oct 11, 2023



* remove from utils

* updated doc string

* only in the model

* Update src/transformers/models/swin/modeling_swin.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Update src/transformers/models/swin/modeling_swin.py
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

cc44ca80

10 Oct, 2023 1 commit

[docstring] Fix docstring for `LlamaConfig` (#26685) · e8fdd787

Pavarissy authored Oct 10, 2023

* Your commit message here

* fix LlamaConfig docstring

* run make fixup

* fix formatting after review

reformat of the file to prevent script issues

* rerun make fixup after reformat

e8fdd787

09 Oct, 2023 4 commits
- [docstring] Fix docstrings for `CLIP` (#26691) · a5e6df82
  Isaac Chung authored Oct 09, 2023
```
fix docstrings for vanilla clip
```
  a5e6df82
- [docstring] Fix docstring for DonutImageProcessor (#26641) · 3257946f
  Alex Bzdel authored Oct 09, 2023
```
* removed donutimageprocessor from objects_to_ignore

* added docstring for donutimageprocessor

* readding donut file

* moved docstring to correct location
```
  3257946f
- [docstring] Fix docstring for `CLIPImageProcessor` (#26676) · d2f06dff
  Isaac Chung authored Oct 09, 2023
```
fix docstring for CLIPImageProcessor
```
  d2f06dff
- [docstring] Fix docstring CLIP configs (#26677) · 3763101f
  Isaac Chung authored Oct 09, 2023
```
* fix docstrings for CLIP configs

* black formatted
```
  3763101f
06 Oct, 2023 1 commit
- [docstring] Fix docstring for `AlbertConfig` (#26636) · 360ea8fc
  Yih-Dar authored Oct 06, 2023
```
example fix docstring
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
```
  360ea8fc
04 Oct, 2023 1 commit

Docstring check (#26052) · 03af4c42

Sylvain Gugger authored Oct 04, 2023



* Fix number of minimal calls to the Hub with peft integration

* Alternate design

* And this way?

* Revert

* Nits to fix

* Add util

* Print when changes are made

* Add list to ignore

* Add more rules

* Manual fixes

* deal with kwargs

* deal with enum defaults

* avoid many digits for floats

* Manual fixes

* Fix regex

* Fix regex

* Auto fix

* Style

* Apply script

* Add ignored list

* Add check that templates are filled

* Adding to CI checks

* Add back semi-fix

* Ignore more objects

* More auto-fixes

* Ignore missing objects

* Remove temp semi-fix

* Fixes

* Update src/transformers/models/pvt/configuration_pvt.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update utils/check_docstrings.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Deal with float defaults

* Fix small defaults

* Address review comment

* Treat

* Post-rebase cleanup

* Address review comment

* Update src/transformers/models/deprecated/mctct/configuration_mctct.py
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

* Address review comment

---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr>

03af4c42