• Suraj Patil's avatar
    Speech2TextTransformer (#10175) · d26b37e7
    Suraj Patil authored
    
    
    * s2t
    
    * fix config
    
    * conversion script
    
    * fix import
    
    * add tokenizer
    
    * fix tok init
    
    * fix tokenizer
    
    * first version working
    
    * fix embeds
    
    * fix lm head
    
    * remove extra heads
    
    * fix convert script
    
    * handle encoder attn mask
    
    * style
    
    * better enc attn mask
    
    * override _prepare_attention_mask_for_generation
    
    * handle attn_maks in encoder and decoder
    
    * input_ids => input_features
    
    * enable use_cache
    
    * remove old code
    
    * expand embeddings if needed
    
    * remove logits bias
    
    * masked_lm_loss => loss
    
    * hack tokenizer to support feature processing
    
    * fix model_input_names
    
    * style
    
    * fix error message
    
    * doc
    
    * remove inputs_embeds
    
    * remove input_embeds
    
    * remove unnecessary docstring
    
    * quality
    
    * SpeechToText => Speech2Text
    
    * style
    
    * remove shared_embeds
    
    * subsample => conv
    
    * remove Speech2TextTransformerDecoderWrapper
    
    * update output_lengths formula
    
    * fix table
    
    * remove max_position_embeddings
    
    * update conversion scripts
    
    * add possibility to do upper case for now
    
    * add FeatureExtractor and Processor
    
    * add tests for extractor
    
    * require_torch_audio => require_torchaudio
    
    * add processor test
    
    * update import
    
    * remove classification head
    
    * attention mask is now 1D
    
    * update docstrings
    
    * attention mask should be of type long
    
    * handle attention mask from generate
    
    * alwyas return attention_mask
    
    * fix test
    
    * style
    
    * doc
    
    * Speech2TextTransformer => Speech2Text
    
    * Speech2TextTransformerConfig => Speech2TextConfig
    
    * remove dummy_inputs
    
    * nit
    
    * style
    
    * multilinguial tok
    
    * fix tokenizer
    
    * add tgt_lang setter
    
    * save lang_codes
    
    * fix tokenizer
    
    * add forced_bos_token_id to tokenizer
    
    * apply review suggestions
    
    * add torchaudio to extra deps
    
    * add speech deps to CI
    
    * fix dep
    
    * add libsndfile to ci
    
    * libsndfile1
    
    * add speech to extras all
    
    * libsndfile1 -> libsndfile1
    
    * libsndfile
    
    * libsndfile1-dev
    
    * apt update
    
    * add sudo to install
    
    * update deps table
    
    * install libsndfile1-dev on CI
    
    * tuple to list
    
    * init conv layer
    
    * add model tests
    
    * quality
    
    * add integration tests
    
    * skip_special_tokens
    
    * add speech_to_text_transformer in toctree
    
    * fix tokenizer
    
    * fix fp16 tests
    
    * add tokenizer tests
    
    * fix copyright
    
    * input_values => input_features
    
    * doc
    
    * add model in readme
    
    * doc
    
    * change checkpoint names
    
    * fix copyright
    
    * fix code example
    
    * add max_model_input_sizes in tokenizer
    
    * fix integration tests
    
    * add do_lower_case to tokenizer
    
    * remove clamp trick
    
    * fix "Add modeling imports here"
    
    * fix copyrights
    
    * fix tests
    
    * SpeechToTextTransformer => SpeechToText
    
    * fix naming
    
    * fix table formatting
    
    * fix typo
    
    * style
    
    * fix typos
    
    * remove speech dep from extras[testing]
    
    * fix copies
    
    * rename doc file,
    
    * put imports under is_torch_available
    
    * run feat extract tests when torch is available
    
    * dummy objects for processor and extractor
    
    * fix imports in tests
    
    * fix import in modeling test
    
    * fxi imports
    
    * fix torch import
    
    * fix imports again
    
    * fix positional embeddings
    
    * fix typo in import
    
    * adapt new extractor refactor
    
    * style
    
    * fix torchscript test
    
    * doc
    
    * doc
    
    * Apply suggestions from code review
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    
    * fix docs, copied from, style
    
    * fix docstring
    
    * handle imports
    
    * remove speech from all extra deps
    
    * remove s2t from seq2seq lm mapping
    
    * better names
    
    * skip training tests
    
    * add install instructions
    
    * List => Tuple
    
    * doc
    
    * fix conversion script
    
    * fix urls
    
    * add instruction for libsndfile
    
    * fix fp16 test
    Co-authored-by: default avatarPatrick von Platen <patrick.v.platen@gmail.com>
    Co-authored-by: default avatarSylvain Gugger <35901082+sgugger@users.noreply.github.com>
    d26b37e7
test_processor_speech_to_text.py 5.81 KB