1. 19 Aug, 2024 1 commit
    • Yen-Ting Lin's avatar
      Add TMLU Benchmark Dataset (#2093) · ca3d86d6
      Yen-Ting Lin authored
      
      
      * add taiwan truthful qa
      
      * add tmlu
      
      * Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script
      
      * add pega eval and legal eval
      
      * add ccp eval
      
      * Update .gitignore and harness_eval.slurm
      
      * Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script
      
      * Add Pega MMLU task and configuration files
      
      * Add new models and update parameters in run_all.sh
      
      * Add UMTCEval tasks and configurations
      
      * Update dataset paths and output path
      
      * Update .gitignore and harness_eval.slurm, and modify _generate_configs.py
      
      * Update SLURM script and add new models
      
      * clean for pr
      
      * Update lm_eval/tasks/tmlu/default/tmlu.yaml
      Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
      
      * adjust tag name
      
      * removed group alias from tasks
      
      * format
      
      ---------
      Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
      Co-authored-by: default avatarlintangsutawika <lintang@eleuther.ai>
      Co-authored-by: default avatarYen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>
      ca3d86d6