• Tanay Mehta's avatar
    Add Number Normalisation for SpeechT5 (#25447) · 182b8374
    Tanay Mehta authored
    * add: NumberNormalizer works for integers, floats, common currencies, negative numbers and percentages
    
    * fix: renamed number normalizer class and added normalization to SpeechT5Processor
    
    * fix: restyled with black and ruff, should pass code quality tests
    
    * fix: moved normalization to tokenizer and other small changes to normalizer
    
    * add: test for normalization and changed the existing full tokenizer test
    
    * fix: tokenization tests now pass, made changes to existing tokenization where normalization is covered; added normalize arg to func signature
    
    * fix: changed default normalize setting to False, modified the tests a bit
    
    * fix: added support for comma separated numbers, tokenization on the fly with kwargs and normalizer getter setter funcs
    182b8374
test_tokenization_speecht5.py 16.8 KB