Commits · main · chenpangpang / parler-tts

15 Aug, 2024 2 commits

Update training guide colab (#108) · 8e465f1b

Yoach Lacombe authored Aug 15, 2024

* Update README.md

* Update README.md

* Update README.md

* update configs and readme

* fix training and eval single gpus and long audios errors

* fix error transcriptions none

* fix trascription null wer

* Update README.md

* Update README.md

---------

Co-authored-by: yoach@huggingface.co <Yoach Lacombe>

8e465f1b

Update training guide (#102) · 8f5ef3a2

Yoach Lacombe authored Aug 15, 2024

* Update README.md

* Update README.md

* Update README.md

* update configs and readme

* fix training and eval single gpus and long audios errors

* fix error transcriptions none

* fix trascription null wer

---------

Co-authored-by: yoach@huggingface.co <Yoach Lacombe>

8f5ef3a2

13 Aug, 2024 1 commit
- Fix typo in INFERENCE.md, change return_tensors and to correct usage of device (#98) · 9f34c1b8
  UncleCode authored Aug 13, 2024
  
  9f34c1b8
08 Aug, 2024 1 commit

V02 release (#94) · 6185106e

Yoach Lacombe authored Aug 08, 2024



* bump version to v0.2

* adapt readme

* Update README.md

* update README

* add inference tips + streamer class

* update readme

* Update README.md

* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update README

* Apply suggestions from code review
Co-authored-by: Vaibhav Srivastav <vaibhavs10@gmail.com>

---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Vaibhav Srivastav <vaibhavs10@gmail.com>

6185106e

07 Aug, 2024 2 commits

add possibility to have audio_output_lengths (#91) · 1551b7c5
Yoach Lacombe authored Aug 07, 2024

1551b7c5

Add static cache (#89) · 862f8418

eustlb authored Aug 07, 2024



* add rope

* don't include padding in rope

* possibly use cross-attn for prompt

* fix rope

* fix cross-attn

* fix self-attn

* fix dummy model

* clean-up rope

* first gqa implementation

* fix wer eval

* feat: add flash attention and spda

* chore: add README for flash attention

* chore: add benchmark script

* chore: add benchmark attention approach

* multi node and fix wer and fix compile

* Update modeling_parler_tts.py

* fix FA2, SDPA and add cross-attn MHA and attention type forcing

* better cross_attention key values number of heads default + add training arguments for attn implementation

* fix audio padding when torch compile or pad_to_max_length=True

* correct multi node

* make rope faster

* fix encoder sdpa

* fix training with cross attention + with FAZ

* use fp32 as default model dtype + fix generation when using FA2 with autocast

* remove redundant passes in generate + clean and fix attentions

* fix edge case in WER evaluation when longform generation

* better multi-node mapping and saving / add eval dataloader num workers

* remove old benchmarks

* faster audio encoding + checkpointing + fix generation step

* unpin trfms

* remove CFG

* imports and constants
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* attention modifications to handle static cach
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* decoder layer modification to handle static cache
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* ParlerTTSPreTrainedModel modifs to handle static cache
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* ParlerTTSDecoder modifs to handle static cache
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* ParlerTTSModel + ParlerTTSForCausalLM modfis
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* ParlerTTSForConditionalGeneration modifs
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* decoder_attention_mask for static cache
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* create inputs_embeds early to have a good cache initialization
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* _get_cache method
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* init the cache
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* ensure good device
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* pin tfrms version
Co-Authored-By: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>

* fix attention_mask FA2

* remove unnecessary method

* Update parler_tts/modeling_parler_tts.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update parler_tts/modeling_parler_tts.py
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* remove unnecessary imports

* replace the hardcoded cache_position with a more elegant approach

* make style

* unpin transformers

* pin transformers

* pin torch

* refactor + unpin torch

* Update parler_tts/modeling_parler_tts.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* update training script to match 11b209e1



* Update parler_tts/modeling_parler_tts.py
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* ensure compatibility with trfms 4.43.3, changes taken from #31980 on trfms

* fix input_ids_length

* warning full attention mask creation

* changes for training compatibility

---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Yoach Lacombe <yoach.lacombe@gmail.com>
Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
Co-authored-by: sang-nguyen-ts <sang.nguyen@trustingsocial.com>
Co-authored-by: yoach@huggingface.co <Yoach Lacombe>
Co-authored-by: sang-nguyen-ts <sang-nguyen-ts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

862f8418

31 Jul, 2024 1 commit

Architecture improvements (#65) · 11b209e1

Yoach Lacombe authored Jul 31, 2024



* add RoPe

* don't include padding in rope

* possibly use cross-attn for prompt

* fix rope

* fix cross-attn

* fix self-attn

* fix dummy model

* clean-up rope

* first gqa implementation

* fix wer eval

* feat: add flash attention and spda

* chore: add README for flash attention

* chore: add benchmark script

* chore: add benchmark attention approach

* multi node and fix wer and fix compile

* Update modeling_parler_tts.py

* fix FA2, SDPA and add cross-attn MHA and attention type forcing

* better cross_attention key values number of heads default + add training arguments for attn implementation

* fix audio padding when torch compile or pad_to_max_length=True

* correct multi node

* make rope faster

* fix encoder sdpa

* fix training with cross attention + with FAZ

* use fp32 as default model dtype + fix generation when using FA2 with autocast

* remove redundant passes in generate + clean and fix attentions

* fix edge case in WER evaluation when longform generation

* better multi-node mapping and saving / add eval dataloader num workers

* remove old benchmarks

* faster audio encoding + checkpointing + fix generation step

* better eval + add right padding + fix eval loss compute

* correct README

* correct config docstrings

* remove comment

* make style

---------
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: sang-nguyen-ts <sang.nguyen@trustingsocial.com>
Co-authored-by: yoach@huggingface.co <Yoach Lacombe>

11b209e1

30 May, 2024 2 commits
- Merge pull request #61 from sanchit-gandhi/eval-fix · 8b8c576e
  Sanchit Gandhi authored May 30, 2024
```
[eval] fix trigger for english normaliser 
```
  8b8c576e
- fix wer · 629b0d29
  sanchit-gandhi authored May 30, 2024
  
  629b0d29
23 May, 2024 2 commits
- Merge pull request #56 from huggingface/ylacombe-patch-2 · 589b385b
  Yoach Lacombe authored May 23, 2024
```
Fix WER transcription
```
  589b385b
- Fix WER transcription · e872b4ee
  Yoach Lacombe authored May 23, 2024
  
  e872b4ee
22 May, 2024 9 commits
- Merge pull request #54 from ylacombe/nits-improvements · 62d6bf69
  Yoach Lacombe authored May 22, 2024
```
Fix naming and add min new tokens
```
  62d6bf69
- Merge branch 'huggingface:main' into nits-improvements · 69646e79
  Yoach Lacombe authored May 22, 2024
  
  69646e79
- fix name and add min new tokens · 0bab56b7
  Yoach Lacombe authored May 22, 2024
  
  0bab56b7
- Merge pull request #53 from ylacombe/nits-improvements · 9232a47b
  Yoach Lacombe authored May 22, 2024
```
[Training] Small nits
```
  9232a47b
- Merge pull request #52 from sanchit-gandhi/wer-norm · 5518cc2f
  Yoach Lacombe authored May 22, 2024
```
[training] compute normalised wer
```
  5518cc2f
- better group by length + name in run · a0bc9e78
  Yoach Lacombe authored May 22, 2024
  
  a0bc9e78
- fix · ed484586
  sanchit-gandhi authored May 22, 2024
  
  ed484586
- generalise to multilingual · 76099f6c
  sanchit-gandhi authored May 22, 2024
  
  76099f6c
- [training] compute normalised wer · aca3f5e4
  sanchit-gandhi authored May 22, 2024
  
  aca3f5e4
18 May, 2024 1 commit
- Merge pull request #49 from choiHkk/hotfix/datacollator_sampling_rate · c2b90bdc
  Sanchit Gandhi authored May 18, 2024
```
[fix] Add fixed sampling rate to feature extractor
```
  c2b90bdc
14 May, 2024 7 commits
- [fix] Add fixed sampling rate to feature extractor · 1d0cc015
  choihk authored May 15, 2024
  
  1d0cc015
- Merge pull request #48 from ylacombe/pr/Wauplin/18 · bdb03638
  Yoach Lacombe authored May 14, 2024
```
Pr/wauplin/18
```
  bdb03638
- remove redundant import and reorganize · 3f5fd26c
  Yoach Lacombe authored May 14, 2024
  
  3f5fd26c
- make style · aa4cbf27
  Yoach Lacombe authored May 14, 2024
  
  aa4cbf27
- Merge branch 'main' into remove-deprecated-repository-class · 9271958b
  Yoach Lacombe authored May 14, 2024
  
  9271958b
- Merge pull request #7 from bghira/documentation/mps-xpu-example · b2b749d1
  Yoach Lacombe authored May 14, 2024
```
add mps and xpu to examples
```
  b2b749d1
- Merge pull request #34 from danlyth/dan/refactor · 840156b2
  Yoach Lacombe authored May 14, 2024
```
Simple re-organization of training script
```
  840156b2
09 May, 2024 4 commits
- Merge pull request #43 from sanchit-gandhi/fix-generation · be2acc26
  Sanchit Gandhi authored May 09, 2024
```
[generation] use private greedy/sampling methods
```
  be2acc26
- pin upper trfms · 553d18f1
  sanchit-gandhi authored May 09, 2024
  
  553d18f1
- bump min trfms version · c1315835
  sanchit-gandhi authored May 09, 2024
  
  c1315835
- use private generation methods · 0589d6c6
  sanchit-gandhi authored May 09, 2024
  
  0589d6c6
30 Apr, 2024 4 commits
- Merge pull request #35 from huggingface/ylacombe-patch-1 · 83d4a719
  Yoach Lacombe authored Apr 30, 2024
```
Add colab link for fine-tuning
```
  83d4a719
- Update README.md · 5baf4fc4
  Yoach Lacombe authored Apr 30, 2024
  
  5baf4fc4
- Update README.md · 3c5eb9d7
  Yoach Lacombe authored Apr 30, 2024
  
  3c5eb9d7
- Add colab link for fine-tuning · da4fcdd5
  Yoach Lacombe authored Apr 30, 2024
  
  da4fcdd5
25 Apr, 2024 2 commits
- moving new scripts to 'training' · 96534194
  Dan Lyth authored Apr 25, 2024
  
  96534194
- removing train.py from this branch · 11c9070d
  Dan Lyth authored Apr 25, 2024
  
  11c9070d
24 Apr, 2024 2 commits
- small train.py updates · eaf7947b
  Dan Lyth authored Apr 24, 2024
  
  eaf7947b
- adding eval.py and simple train.py, re-instating run_parler_tts_training.py · 3170ac02
  Dan Lyth authored Apr 24, 2024
  
  3170ac02