• Yen-Ting Lin's avatar
    Add TMLU Benchmark Dataset (#2093) · ca3d86d6
    Yen-Ting Lin authored
    
    
    * add taiwan truthful qa
    
    * add tmlu
    
    * Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script
    
    * add pega eval and legal eval
    
    * add ccp eval
    
    * Update .gitignore and harness_eval.slurm
    
    * Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script
    
    * Add Pega MMLU task and configuration files
    
    * Add new models and update parameters in run_all.sh
    
    * Add UMTCEval tasks and configurations
    
    * Update dataset paths and output path
    
    * Update .gitignore and harness_eval.slurm, and modify _generate_configs.py
    
    * Update SLURM script and add new models
    
    * clean for pr
    
    * Update lm_eval/tasks/tmlu/default/tmlu.yaml
    Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
    
    * adjust tag name
    
    * removed group alias from tasks
    
    * format
    
    ---------
    Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
    Co-authored-by: default avatarlintangsutawika <lintang@eleuther.ai>
    Co-authored-by: default avatarYen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>
    ca3d86d6
utils.py 750 Bytes