• Yu Liu's avatar
    Albert pretrain datasets/ datacollator (#6168) · 762cba3b
    Yu Liu authored
    
    
    * add dataset for albert pretrain
    
    * datacollator for albert pretrain
    
    * naming, comprehension, file reading change
    
    * data cleaning is no needed after this modification
    
    * delete prints
    
    * fix a bug
    
    * file structure change
    
    * add tests for albert datacollator
    
    * remove random seed
    
    * add back len and get item function
    
    * sample file for testing and test code added
    
    * format change for black
    
    * more format change
    
    * Style
    
    * var assignment issue resolve
    
    * add back wrongly deleted DataCollatorWithPadding in init file
    
    * Style
    Co-authored-by: default avatarLysandre Debut <lysandre@huggingface.co>
    Co-authored-by: default avatarLysandre <lysandre.debut@reseau.eseo.fr>
    762cba3b
test_data_collator.py 9.01 KB