1. 01 Sep, 2023 1 commit
    • Matthijs Hollemans's avatar
      add VITS model (#24085) · 4ece3b94
      Matthijs Hollemans authored
      
      
      * add VITS model
      
      * let's vits
      
      * finish TextEncoder (mostly)
      
      * rename VITS to Vits
      
      * add StochasticDurationPredictor
      
      * ads flow model
      
      * add generator
      
      * correctly set vocab size
      
      * add tokenizer
      
      * remove processor & feature extractor
      
      * add PosteriorEncoder
      
      * add missing weights to SDP
      
      * also convert LJSpeech and VCTK checkpoints
      
      * add training stuff in forward
      
      * add placeholder tests for tokenizer
      
      * add placeholder tests for model
      
      * starting cleanup
      
      * let the great renaming begin!
      
      * use config
      
      * global_conditioning
      
      * more cleaning
      
      * renaming variables
      
      * more renaming
      
      * more renaming
      
      * it never ends
      
      * reticulating the splines
      
      * more renaming
      
      * HiFi-GAN
      
      * doc strings for main model
      
      * fixup
      
      * fix-copies
      
      * don't make it a PreTrainedModel
      
      * fixup
      
      * rename config options
      
      * remove training logic from forward pass
      
      * simplify relative position
      
      * use actual checkpoint
      
      * style
      
      * PR review fixes
      
      * more review changes
      
      * fixup
      
      * more unit tests
      
      * fixup
      
      * fix doc test
      
      * add integration test
      
      * improve tokenizer tests
      
      * add tokenizer integration test
      
      * fix tests on GPU (gave OOM)
      
      * conversion script can handle repos from hub
      
      * add conversion script for all MMS-TTS checkpoints
      
      * automatically create a README for the converted checkpoint
      
      * small changes to config
      
      * push README to hub
      
      * only show uroman note for checkpoints that need it
      
      * remove conversion script because code formatting breaks the readme
      
      * make WaveNet layers configurable
      
      * rename variables
      
      * simplifying the math
      
      * output attentions and hidden states
      
      * remove VitsFlip in flow model
      
      * also got rid of the other flip
      
      * fix tests
      
      * rename more variables
      
      * rename tokenizer, add phonemization
      
      * raise error when phonemizer missing
      
      * re-order config docstrings to match method
      
      * change config naming
      
      * remove redundant str -> list
      
      * fix copyright: vits authors -> kakao enterprise
      
      * (mean, log_variances) -> (prior_mean, prior_log_variances)
      
      * if return dict -> if not return dict
      
      * speed -> speaking rate
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * update fused tanh sigmoid
      
      * reduce dims in tester
      
      * audio -> output_values
      
      * audio -> output_values in tuple out
      
      * fix return type
      
      * fix return type
      
      * make _unconstrained_rational_quadratic_spline a function
      
      * all nn's to accept a config
      
      * add spectro to output
      
      * move {speaking rate, noise scale, noise scale duration} to config
      
      * path -> attn_path
      
      * idxs -> valid idxs -> padded idxs
      
      * output values -> waveform
      
      * use config for attention
      
      * make generation work
      
      * harden integration test
      
      * add spectrogram to dict output
      
      * tokenizer refactor
      
      * make style
      
      * remove 'fake' padding token
      
      * harden tokenizer tests
      
      * ron norm test
      
      * fprop / save tests deterministic
      
      * move uroman to tokenizer as much as possible
      
      * better logger message
      
      * fix vivit imports
      
      * add uroman integration test
      
      * make style
      
      * up
      
      * matthijs -> sanchit-gandhi
      
      * fix tokenizer test
      
      * make fix-copies
      
      * fix dict comprehension
      
      * fix config tests
      
      * fix model tests
      
      * make outputs consistent with reverse/not reverse
      
      * fix key concat
      
      * more model details
      
      * add author
      
      * return dict
      
      * speaker error
      
      * labels error
      
      * Apply suggestions from code review
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * Update src/transformers/models/vits/convert_original_checkpoint.py
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * remove uromanize
      
      * add docstrings
      
      * add docstrings for tokenizer
      
      * upper-case skip messages
      
      * fix return dict
      
      * style
      
      * finish tests
      
      * update checkpoints
      
      * make style
      
      * remove doctest file
      
      * revert
      
      * fix docstring
      
      * fix tokenizer
      
      * remove uroman integration test
      
      * add sampling rate
      
      * fix docs / docstrings
      
      * style
      
      * add sr to model output
      
      * fix outputs
      
      * style / copies
      
      * fix docstring
      
      * fix copies
      
      * remove sr from model outputs
      
      * Update utils/documentation_tests.txt
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      
      * add sr as allowed attr
      
      ---------
      Co-authored-by: default avatarsanchit-gandhi <sanchit@huggingface.co>
      Co-authored-by: default avatarSanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
      Co-authored-by: default avataramyeroberts <22614925+amyeroberts@users.noreply.github.com>
      4ece3b94
  2. 31 Aug, 2023 1 commit
  3. 30 Aug, 2023 3 commits
  4. 29 Aug, 2023 12 commits
  5. 28 Aug, 2023 1 commit
  6. 25 Aug, 2023 5 commits
  7. 24 Aug, 2023 2 commits
  8. 23 Aug, 2023 2 commits
  9. 22 Aug, 2023 4 commits
  10. 21 Aug, 2023 2 commits
    • Susnato Dhar's avatar
      Add Pop2Piano (#21785) · 450a181d
      Susnato Dhar authored
      
      
      * init commit
      
      * config updated also some modeling
      
      * Processor and Model config combined
      
      * extraction pipeline(upto before spectogram & mel_conditioner) added but not properly tested
      
      * model loading successful!
      
      * feature extractor done!
      
      * FE can now be called from HF
      
      * postprocessing added in fe file
      
      * same as prev commit
      
      * Pop2PianoConfig doc done
      
      * cfg docs slightly changed
      
      * fe docs done
      
      * batched
      
      * batched working!
      
      * temp
      
      * v1
      
      * checking
      
      * trying to go with generate
      
      * with generate and model tests passed
      
      * before rebasing
      
      * .
      
      * tests done docs done remaining others & nits
      
      * nits
      
      * LogMelSpectogram shifted to FeatureExtractor
      
      * is_tf rmeoved from pop2piano/init
      
      * import solved
      
      * tokenization tests added
      
      * minor fixed regarding modeling_pop2piano
      
      * tokenizer changed to only return midi_object and other changes
      
      * Updated paper abstract(Camera-ready version) (#2)
      
      * more comments and nits
      
      * ruff changes
      
      * code quality fix
      
      * sg comments
      
      * t5 change added and rebased
      
      * comments except batching
      
      * batching done
      
      * comments
      
      * small doc fix
      
      * example removed from modeling
      
      * ckpt
      
      * forward it compatible with fe and generation done
      
      * comments
      
      * comments
      
      * code-quality fix(maybe)
      
      * ckpts changed
      
      * doc file changed from mdx to md
      
      * test fixes
      
      * tokenizer test fix
      
      * changes
      
      * nits done main changes remaining
      
      * code modified
      
      * Pop2PianoProcessor added with tests
      
      * other comments
      
      * added Pop2PianoProcessor to dummy_objects
      
      * added require_onnx to modeling file
      
      * changes
      
      * update .md file
      
      * remove extra line in index.md
      
      * back to the main index
      
      * added pop2piano to index
      
      * Added tokenizer.__call__ with valid args and batch_decode and aligned the processor part too
      
      * changes
      
      * added return types to 2 tokenizer methods
      
      * the PR build test might work now
      
      * added backends
      
      * PR build fix
      
      * vocab added
      
      * comments
      
      * refactored vocab into 1 file
      
      * added conversion script
      
      * comments
      
      * essentia version changed in .md
      
      * comments
      
      * more tokenizer tests added
      
      * minor fix
      
      * tests extended for outputs acc check
      
      * small fix
      
      ---------
      Co-authored-by: default avatarJongho Choi <sweetcocoa@snu.ac.kr>
      450a181d
    • mchau's avatar
      fix documentation for CustomTrainer (#25635) · 6f041fcb
      mchau authored
      fix doc
      6f041fcb
  11. 18 Aug, 2023 7 commits