- 16 May, 2022 1 commit
-
-
Loubna Ben Allal authored
* add new preprocessing arguments * add new filters * add new filters to readme * fix config and test count, update function names and docstrings * reformat code * update readme * Update readme * rename config_test filter Co-authored-by:
Leandro von Werra <lvwerra@users.noreply.github.com> * rename few_assignments filter Co-authored-by:
Leandro von Werra <lvwerra@users.noreply.github.com> * rename tokenizer in arguments Co-authored-by:
Leandro von Werra <lvwerra@users.noreply.github.com> * rename functions and add limit_line argument for config_test filter * update threshold for config_test filter Co-authored-by:
Leandro von Werra <lvwerra@users.noreply.github.com> Co-authored-by:
Loubna ben allal <loubnabenallal@gmail.com>
-
- 04 May, 2022 1 commit
-
-
Thomas Wang authored
-
- 02 Dec, 2021 1 commit
-
-
Leandro von Werra authored
* add readme skeleton * update readme * add initialization script * add deduplication script * add codeparrot training script * add code generation evaluation * add validation loss script * add requirements * update readme * tweak readme * make style * add highlights to readme * add CLIs to scripts * add tokenizer training script * add docstring to constant length dataset * fix defaults in arguments * update readme with cli * move image to hub * tweaks of readme * fix cli commands * add author * explain env variables * fix formatting * Update examples/research_projects/codeparrot/README.md Co-authored-by:
lewtun <lewis.c.tunstall@gmail.com> * Apply suggestions from code review Co-authored-by:
lewtun <lewis.c.tunstall@gmail.com> * replace generic with gpt2 tokenizer Co-authored-by:
lewtun <lewis.c.tunstall@gmail.com>
-