1. 26 Feb, 2024 8 commits
  2. 24 Feb, 2024 1 commit
    • LSinev's avatar
      Add environment and transformers version logging in results dump (#1464) · f78e2da4
      LSinev authored
      * Save git_hash to results even if git is not available to call as subprocess
      
      * Store more info about environment and transformers version in results to help researchers track inconsistencies
      
      * moved added logging to logging_utils
      
      * moved get_git_commit_hash to logging_utils.py
      
      * moved add_env_info inside evaluator
      f78e2da4
  3. 23 Feb, 2024 2 commits
  4. 22 Feb, 2024 5 commits
  5. 21 Feb, 2024 1 commit
    • Hanwool Albert Lee's avatar
      Added KMMLU evaluation method and changed ReadMe (#1447) · c26a6ac7
      Hanwool Albert Lee authored
      
      
      * update kmmlu default formatting
      
      * Update _default_kmmlu_yaml
      
      * Delete lm_eval/tasks/kmmlu/utils.py
      
      * new tasks implemented
      
      * add direct tasks
      
      * update direct evaluate
      
      * update direct eval
      
      * add cot sample
      
      * update cot
      
      * add cot
      
      * Update _cot_kmmlu_yaml
      
      * add kmmlu90
      
      * Update and rename _cot_kmmlu.yaml to _cot_kmmlu_yaml
      
      * Create kmmlu90.yaml
      
      * Update _cot_kmmlu_yaml
      
      * add direct
      
      * Update _cot_kmmlu_yaml
      
      * Update and rename kmmlu90.yaml to kmmlu90_cot.yaml
      
      * Update kmmlu90_direct.yaml
      
      * add kmmlu hard
      
      * Update _cot_kmmlu_yaml
      
      * Update _cot_kmmlu_yaml
      
      * update cot
      
      * update cot
      
      * erase typo
      
      * Update _cot_kmmlu_yaml
      
      * update cot
      
      * Rename dataset to match k-mmlu-hard
      
      * removed kmmlu90
      
      * fixed name 'kmmlu_cot' to 'kmmlu_hard_cot' and revised README
      
      * applied pre-commit before pull requests
      
      * rename datasets and add notes
      
      * Remove DS_Store cache
      
      * Update lm_eval/tasks/kmmlu/README.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Change citations and reflect reviews on version
      
      * Added kmmlu_hard and fixed other errors
      
      * fixing minor errors
      
      * remove duplicated
      
      * Rename files
      
      * try ".index"
      
      * minor fix
      
      * minor fix again
      
      * fix revert.
      
      * minor fix. thank for hailey
      
      ---------
      Co-authored-by: default avatarGUIJIN SON <spthsrbwls123@yonsei.ac.kr>
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      c26a6ac7
  6. 20 Feb, 2024 3 commits
  7. 19 Feb, 2024 2 commits
  8. 18 Feb, 2024 1 commit
  9. 15 Feb, 2024 1 commit
  10. 14 Feb, 2024 1 commit
  11. 13 Feb, 2024 1 commit
  12. 12 Feb, 2024 2 commits
  13. 11 Feb, 2024 3 commits
    • Uanu's avatar
      Add multilingual TruthfulQA task (#1420) · 7397b965
      Uanu authored
      7397b965
    • Uanu's avatar
      Add multilingual ARC task (#1419) · 0256c682
      Uanu authored
      0256c682
    • Baber Abbasi's avatar
      Evaluate (#1385) · 1ff84897
      Baber Abbasi authored
      * un-exclude `evaluate.py` from linting
      
      * readability
      
      * readability
      
      * add task name to build info message
      
      * fix link
      
      * nit
      
      * add functions for var and mean pooling
      
      * add functions for var and mean pooling
      
      * metadata compatibility with task
      
      * rename `override_config` to `set_config` and move to `Task`
      
      * add unit test
      
      * nit
      
      * nit
      
      * bugfix
      
      * nit
      
      * nit
      
      * nit
      
      * add docstrings
      
      * fix metadata-fewshot
      
      * revert metric refactor
      
      * nit
      
      * type checking
      
      * type hints
      
      * type hints
      
      * move `override_metric` to `Task`
      
      * change metadata
      
      * change name
      
      * pre-commit
      
      * rename
      
      * remove
      
      * remove
      
      * `override_metric` backwards compatible with `Task`
      
      * type hints
      
      * use generic
      
      * type hint
      1ff84897
  14. 10 Feb, 2024 2 commits
  15. 09 Feb, 2024 1 commit
  16. 07 Feb, 2024 1 commit
  17. 06 Feb, 2024 4 commits
  18. 05 Feb, 2024 1 commit