• Leandro von Werra's avatar
    Add CodeParrot 馃 codebase (#14536) · 43f953cc
    Leandro von Werra authored
    
    
    * add readme skeleton
    
    * update readme
    
    * add initialization script
    
    * add deduplication script
    
    * add codeparrot training script
    
    * add code generation evaluation
    
    * add validation loss script
    
    * add requirements
    
    * update readme
    
    * tweak readme
    
    * make style
    
    * add highlights to readme
    
    * add CLIs to scripts
    
    * add tokenizer training script
    
    * add docstring to constant length dataset
    
    * fix defaults in arguments
    
    * update readme with cli
    
    * move image to hub
    
    * tweaks of readme
    
    * fix cli commands
    
    * add author
    
    * explain env variables
    
    * fix formatting
    
    * Update examples/research_projects/codeparrot/README.md
    Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>
    
    * Apply suggestions from code review
    Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>
    
    * replace generic with gpt2 tokenizer
    Co-authored-by: default avatarlewtun <lewis.c.tunstall@gmail.com>
    43f953cc
validation_loss.py 3.41 KB