"docs/zh_cn/vscode:/vscode.git/clone" did not exist on "b50e2035f1a46c663945915156a6016dc845083e"
  • Matthijs Hollemans's avatar
    add VITS model (#24085) · 4ece3b94
    Matthijs Hollemans authored
    
    
    * add VITS model
    
    * let's vits
    
    * finish TextEncoder (mostly)
    
    * rename VITS to Vits
    
    * add StochasticDurationPredictor
    
    * ads flow model
    
    * add generator
    
    * correctly set vocab size
    
    * add tokenizer
    
    * remove processor & feature extractor
    
    * add PosteriorEncoder
    
    * add missing weights to SDP
    
    * also convert LJSpeech and VCTK checkpoints
    
    * add training stuff in forward
    
    * add placeholder tests for tokenizer
    
    * add placeholder tests for model
    
    * starting cleanup
    
    * let the great renaming begin!
    
    * use config
    
    * global_conditioning
    
    * more cleaning
    
    * renaming variables
    
    * more renaming
    
    * more renaming
    
    * it never ends
    
    * reticulating the splines
    
    * more renaming
    
    * HiFi-GAN
    
    * doc strings for main model
    
    * fixup
    
    * fix-copies
    
    * don't make it a PreTrainedModel
    
    * fixup
    
    * rename config options
    
    * remove training logic from forward pass
    
    * simplify relative position
    
    * use actual checkpoint
    
    * style
    
    * PR review fixes
    
    * more review changes
    
    * fixup
    
    * more unit tests
    
    * fixup
    
    * fix doc test
    
    * add integration test
    
    * improve tokenizer tests
    
    * add tokenizer integration test
    
    * fix tests on GPU (gave OOM)
    
    * conversion script can handle repos from hub
    
    * add conversion script for all MMS-TTS checkpoints
    
    * automatically create a README for the converted checkpoint
    
    * small changes to config
    
    * push README to hub
    
    * only show uroman note for checkpoints that need it
    
    * remove conversion script because code formatting breaks the readme
    
    * make WaveNet layers configurable
    
    * rename variables
    
    * simplifying the math
    
    * output attentions and hidden states
    
    * remove VitsFlip in flow model
    
    * also got rid of the other flip
    
    * fix tests
    
    * rename more variables
    
    * rename tokenizer, add phonemization
    
    * raise error when phonemizer missing
    
    * re-order config docstrings to match method
    
    * change config naming
    
    * remove redundant str -> list
    
    * fix copyright: vits authors -> kakao enterprise
    
    * (mean, log_variances) -> (prior_mean, prior_log_variances)
    
    * if return dict -> if not return dict
    
    * speed -> speaking rate
    
    * Apply suggestions from code review
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * update fused tanh sigmoid
    
    * reduce dims in tester
    
    * audio -> output_values
    
    * audio -> output_values in tuple out
    
    * fix return type
    
    * fix return type
    
    * make _unconstrained_rational_quadratic_spline a function
    
    * all nn's to accept a config
    
    * add spectro to output
    
    * move {speaking rate, noise scale, noise scale duration} to config
    
    * path -> attn_path
    
    * idxs -> valid idxs -> padded idxs
    
    * output values -> waveform
    
    * use config for attention
    
    * make generation work
    
    * harden integration test
    
    * add spectrogram to dict output
    
    * tokenizer refactor
    
    * make style
    
    * remove 'fake' padding token
    
    * harden tokenizer tests
    
    * ron norm test
    
    * fprop / save tests deterministic
    
    * move uroman to tokenizer as much as possible
    
    * better logger message
    
    * fix vivit imports
    
    * add uroman integration test
    
    * make style
    
    * up
    
    * matthijs -> sanchit-gandhi
    
    * fix tokenizer test
    
    * make fix-copies
    
    * fix dict comprehension
    
    * fix config tests
    
    * fix model tests
    
    * make outputs consistent with reverse/not reverse
    
    * fix key concat
    
    * more model details
    
    * add author
    
    * return dict
    
    * speaker error
    
    * labels error
    
    * Apply suggestions from code review
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * Update src/transformers/models/vits/convert_original_checkpoint.py
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * remove uromanize
    
    * add docstrings
    
    * add docstrings for tokenizer
    
    * upper-case skip messages
    
    * fix return dict
    
    * style
    
    * finish tests
    
    * update checkpoints
    
    * make style
    
    * remove doctest file
    
    * revert
    
    * fix docstring
    
    * fix tokenizer
    
    * remove uroman integration test
    
    * add sampling rate
    
    * fix docs / docstrings
    
    * style
    
    * add sr to model output
    
    * fix outputs
    
    * style / copies
    
    * fix docstring
    
    * fix copies
    
    * remove sr from model outputs
    
    * Update utils/documentation_tests.txt
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    
    * add sr as allowed attr
    
    ---------
    Co-authored-by: default avatarsanchit-gandhi <sanchit@huggingface.co>
    Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
    Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
    4ece3b94
README.md 95.5 KB