1. 05 Sep, 2024 1 commit
  2. 04 Sep, 2024 1 commit
  3. 30 Aug, 2024 2 commits
    • Baber Abbasi's avatar
      hotfix #2262 (#2264) · 928e8bb6
      Baber Abbasi authored
      * max_length - 1 (generation always >= 1)
      
      * vllm: fix rolling prefix_token
      
      * nit: add comment
      
      * fixup! max_length should be handled for logliklihoods
      
      * Revert "fixup! max_length should be handled for logliklihoods"
      
      This reverts commit 432d1a3b754c117c3a54ea2fe792ab3a1bd09ed3.
      928e8bb6
    • Baber Abbasi's avatar
      API: fix maxlen; vllm: prefix_token_id bug (#2262) · b31f92e8
      Baber Abbasi authored
      * max_length - 1 (generation always >= 1)
      
      * vllm: fix rolling prefix_token
      
      * nit: add comment
      
      * fixup! max_length should be handled for logliklihoods
      b31f92e8
  4. 28 Aug, 2024 3 commits
  5. 25 Aug, 2024 1 commit
  6. 23 Aug, 2024 3 commits
  7. 22 Aug, 2024 3 commits
  8. 20 Aug, 2024 6 commits
  9. 19 Aug, 2024 3 commits
    • Yen-Ting Lin's avatar
      Add TMLU Benchmark Dataset (#2093) · ca3d86d6
      Yen-Ting Lin authored
      
      
      * add taiwan truthful qa
      
      * add tmlu
      
      * Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script
      
      * add pega eval and legal eval
      
      * add ccp eval
      
      * Update .gitignore and harness_eval.slurm
      
      * Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script
      
      * Add Pega MMLU task and configuration files
      
      * Add new models and update parameters in run_all.sh
      
      * Add UMTCEval tasks and configurations
      
      * Update dataset paths and output path
      
      * Update .gitignore and harness_eval.slurm, and modify _generate_configs.py
      
      * Update SLURM script and add new models
      
      * clean for pr
      
      * Update lm_eval/tasks/tmlu/default/tmlu.yaml
      Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
      
      * adjust tag name
      
      * removed group alias from tasks
      
      * format
      
      ---------
      Co-authored-by: default avatarLintang Sutawika <lintang@sutawika.com>
      Co-authored-by: default avatarlintangsutawika <lintang@eleuther.ai>
      Co-authored-by: default avatarYen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>
      ca3d86d6
    • Uminosachi's avatar
      86edeffa
    • am-bean's avatar
      Lingoly README update (#2228) · f81b62bf
      am-bean authored
      * Setting up lingoly task
      
      * Testing yaml changes to debug
      
      * Adding pre-commit hooks
      
      * Functional LingOly benchmark
      
      * Renaming files and adding grouping
      
      * Extending group aggregations to allow custom functions. Setting up custom lingoly aggregation using difference in scores.
      
      * Adding LingOly to the README file
      f81b62bf
  10. 16 Aug, 2024 1 commit
  11. 15 Aug, 2024 2 commits
    • am-bean's avatar
      New task: Lingoly (#2198) · 8b41f925
      am-bean authored
      * Setting up lingoly task
      
      * Testing yaml changes to debug
      
      * Adding pre-commit hooks
      
      * Functional LingOly benchmark
      
      * Renaming files and adding grouping
      
      * Extending group aggregations to allow custom functions. Setting up custom lingoly aggregation using difference in scores.
      8b41f925
    • Anton Polishko's avatar
      Update citation in README.md (#2083) · cbdc3539
      Anton Polishko authored
      Bumped citation to the v0.4.3
      cbdc3539
  12. 10 Aug, 2024 1 commit
  13. 09 Aug, 2024 1 commit
    • Jungwhan Kim's avatar
      keep new line for task description (#2116) · 8ad598df
      Jungwhan Kim authored
      
      
      * add keep trailing newline
      
      * apply ruff-format
      
      * add prompt unit test
      
      * increment the version of tasks that have description with whitespace
      
      * remove white spaces of leaderboard bbh
      
      * update MMLU expected versions in output
      
      * CI run does display the expected version=1 for mmlu subtasks, fix expected test output again
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      8ad598df
  14. 07 Aug, 2024 1 commit
  15. 05 Aug, 2024 8 commits
  16. 04 Aug, 2024 2 commits
  17. 01 Aug, 2024 1 commit