- 19 Aug, 2024 1 commit
-
-
Yen-Ting Lin authored
* add taiwan truthful qa * add tmlu * Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script * add pega eval and legal eval * add ccp eval * Update .gitignore and harness_eval.slurm * Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script * Add Pega MMLU task and configuration files * Add new models and update parameters in run_all.sh * Add UMTCEval tasks and configurations * Update dataset paths and output path * Update .gitignore and harness_eval.slurm, and modify _generate_configs.py * Update SLURM script and add new models * clean for pr * Update lm_eval/tasks/tmlu/default/tmlu.yaml Co-authored-by:
Lintang Sutawika <lintang@sutawika.com> * adjust tag name * removed group alias from tasks * format --------- Co-authored-by:
Lintang Sutawika <lintang@sutawika.com> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
Yen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>
-
- 05 Apr, 2024 1 commit
-
-
ZoneTwelve authored
* implementation of TMMLU+ * implemented: TMMLU+ ****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding**** - 4 categories - STEM - Social Science - Humanities - Other The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models. ```markdown Total number of tasks in the 'test' sets: 20160 Total number of tasks in the 'validation' sets: 2247 Total number of tasks in the 'train' sets: 335 ``` * Remove print from __init__.py There was my mistake in forgetting to remove the debug print from the code. * update: move TMMLU+ config generation program into default * fix: we should use training set as few shots example * update: README for TMMLU+ * update: a small changes of TMMLU+ README file * pre-commit run thought * Add README for TMMLU+ dataset * run precommit * trigger precommit again * trigger precommit again * isort is fussy * isort is fussy * format, again * oops * oops --------- Co-authored-by:lintang <lintang@eleuther.ai> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-