1. 26 Feb, 2024 4 commits
  2. 23 Feb, 2024 1 commit
  3. 22 Feb, 2024 1 commit
  4. 21 Feb, 2024 1 commit
    • Hanwool Albert Lee's avatar
      Added KMMLU evaluation method and changed ReadMe (#1447) · c26a6ac7
      Hanwool Albert Lee authored
      
      
      * update kmmlu default formatting
      
      * Update _default_kmmlu_yaml
      
      * Delete lm_eval/tasks/kmmlu/utils.py
      
      * new tasks implemented
      
      * add direct tasks
      
      * update direct evaluate
      
      * update direct eval
      
      * add cot sample
      
      * update cot
      
      * add cot
      
      * Update _cot_kmmlu_yaml
      
      * add kmmlu90
      
      * Update and rename _cot_kmmlu.yaml to _cot_kmmlu_yaml
      
      * Create kmmlu90.yaml
      
      * Update _cot_kmmlu_yaml
      
      * add direct
      
      * Update _cot_kmmlu_yaml
      
      * Update and rename kmmlu90.yaml to kmmlu90_cot.yaml
      
      * Update kmmlu90_direct.yaml
      
      * add kmmlu hard
      
      * Update _cot_kmmlu_yaml
      
      * Update _cot_kmmlu_yaml
      
      * update cot
      
      * update cot
      
      * erase typo
      
      * Update _cot_kmmlu_yaml
      
      * update cot
      
      * Rename dataset to match k-mmlu-hard
      
      * removed kmmlu90
      
      * fixed name 'kmmlu_cot' to 'kmmlu_hard_cot' and revised README
      
      * applied pre-commit before pull requests
      
      * rename datasets and add notes
      
      * Remove DS_Store cache
      
      * Update lm_eval/tasks/kmmlu/README.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Change citations and reflect reviews on version
      
      * Added kmmlu_hard and fixed other errors
      
      * fixing minor errors
      
      * remove duplicated
      
      * Rename files
      
      * try ".index"
      
      * minor fix
      
      * minor fix again
      
      * fix revert.
      
      * minor fix. thank for hailey
      
      ---------
      Co-authored-by: default avatarGUIJIN SON <spthsrbwls123@yonsei.ac.kr>
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      c26a6ac7
  5. 20 Feb, 2024 3 commits
  6. 19 Feb, 2024 2 commits
  7. 15 Feb, 2024 1 commit
  8. 13 Feb, 2024 1 commit
  9. 12 Feb, 2024 1 commit
  10. 11 Feb, 2024 2 commits
  11. 02 Feb, 2024 1 commit
  12. 01 Feb, 2024 2 commits
    • Lintang Sutawika's avatar
      Faster Task and Group Loading, Allow Recursive Groups (#1321) · d714fc95
      Lintang Sutawika authored
      
      
      * add trust_remote_code as default
      
      * task for testing recursive
      
      * changed source of ALL_TASKS
      
      * tasks should only accept TaskObjects
      
      * initialize_tasks returns list of tasks and groups
      
      * remove trust_remote_code for now
      
      * moved constructor process to inside load_yaml_config
      
      * more comprehensive way to index tasks and groups
      
      * pre-commit format
      
      * add exit after error
      
      * adjust how task objects are called
      
      * no need to use get_task_dict
      
      * load_task_or_group works but only for tasks
      
      * pre-commit format
      
      * half working for nested groups
      
      * changed variable names
      
      * allow groups and tasks to work
      
      * temp save
      
      * indexing and loading are part of a task_manager object
      
      * adapted initialize_tasks
      
      * iron out bugs
      
      * fixed typo
      
      * fixed typo
      
      * simplified code
      
      * further tidy up
      
      * remove lines for testing
      
      * removed test lines
      
      * removed unused code
      
      * remove unused import
      
      * fixed bug
      
      * removed comments
      
      * group in a list of group can accept parameter changes like `num_fewshot`
      
      * add trust_remote_code as default
      
      * task for testing recursive
      
      * changed source of ALL_TASKS
      
      * tasks should only accept TaskObjects
      
      * initialize_tasks returns list of tasks and groups
      
      * remove trust_remote_code for now
      
      * moved constructor process to inside load_yaml_config
      
      * more comprehensive way to index tasks and groups
      
      * pre-commit format
      
      * add exit after error
      
      * adjust how task objects are called
      
      * no need to use get_task_dict
      
      * load_task_or_group works but only for tasks
      
      * pre-commit format
      
      * half working for nested groups
      
      * changed variable names
      
      * allow groups and tasks to work
      
      * temp save
      
      * indexing and loading are part of a task_manager object
      
      * adapted initialize_tasks
      
      * iron out bugs
      
      * fixed typo
      
      * fixed typo
      
      * simplified code
      
      * further tidy up
      
      * remove lines for testing
      
      * removed test lines
      
      * removed unused code
      
      * remove unused import
      
      * fixed bug
      
      * removed comments
      
      * group in a list of group can accept parameter changes like `num_fewshot`
      
      * check if config is task update
      
      * add GroupConfig object
      
      * edit test yaml
      
      * remove args
      
      * testing returning to python task list
      
      * add weight_by_size config
      
      * describe weight_by_size in docs
      
      * fix weight by size potential error
      
      * can load individual custom python class task
      
      * moved import_function into the config loading file
      
      * remove print lines
      
      * add squadv2 yaml
      
      * temporary scroll implementation
      
      * revert back to use load_yaml_config but with modes
      
      * fix group being loaded with a None
      
      * reformat
      
      * can load unregistered tasks from a group
      
      * update scrolls
      
      * edit scrolls multiplechoice task
      
      * adjust class initialization
      
      * fix initialization
      
      * changed how to identify group and python tasks, fix logger
      
      * allow loading "include" that is nested in a group config
      
      * reworked flan benchmark
      
      * allow duplicate task in the same group to co-exist
      
      * process group_alias
      
      * removed group_alias
      
      * allow parameters set in group_config to apply to all tasks in tasklist
      
      * add function, but comment for now
      
      * reworked processing dict-base config
      
      * fixed how configs in group are processed
      
      * update to allow root group to have its alias used
      
      * remove unused classes
      
      * remove unused classes
      
      * revert some parts to original
      
      * forgot to change one variable
      
      * adapt the new process to use get_task_dict
      
      * fix for singular group call
      
      * fix variable names
      
      * add TaskManager into the evaluator
      
      * format
      
      * changed how dict tasks are loaded
      
      * add docs
      
      * Update docs/new_task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Update evaluator.py
      
      * Update evaluator.py
      
      * remove groupconfig for now
      
      * changed _config to config
      
      * update interface.md to explain TaskManager
      
      * added property functions
      
      * adjusted logger
      
      * update write_out.py
      
      * updated tests
      
      * added documentation and some modifications
      
      * added docstring documentation
      
      * precommit format
      
      * updated task loading for tests
      
      * updates tests
      
      * changed arg order for load_yaml_config
      
      * update to handle scrolls and edit log message
      
      * remove unused lines
      
      * return a list of task classes and not a dict
      
      * Update __init__.py
      
      * Delete lm_eval/tasks/benchmarks/test.yaml
      
      * Update task.py
      
      * Update lm_eval/utils.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Update lm_eval/utils.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Update utils.py
      
      * re-added old functions with new log message
      
      * Update docs/new_task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Update new_task_guide.md
      
      * added infor regarding `get_task_dict` and documentation
      
      * add get_config for Task
      
      * pre-commit formatting
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      d714fc95
    • Hailey Schoelkopf's avatar
      Enable override of printed `n-shot` in table (#1379) · 17191063
      Hailey Schoelkopf authored
      * allow tasks to specify printed fewshot val
      
      * fix to belebele
      
      * update metadata field's documentation
      17191063
  13. 31 Jan, 2024 1 commit
  14. 28 Jan, 2024 1 commit
  15. 25 Jan, 2024 1 commit
    • Baber Abbasi's avatar
      `Filter` docs not offset by `doc_id` (#1349) · a0f1cacd
      Baber Abbasi authored
      * get `doc` from instance
      
      * acceletate bugfix: get ground doc from instance
      
      * convert filter to `process_result`
      
      * get docs from instances in `FilterEnsemble`
      
      * rename
      
      * nit
      
      * better looping
      
      * fix typehint
      a0f1cacd
  16. 23 Jan, 2024 2 commits
  17. 19 Jan, 2024 1 commit
  18. 18 Jan, 2024 3 commits
  19. 16 Jan, 2024 1 commit
  20. 15 Jan, 2024 2 commits
  21. 12 Jan, 2024 1 commit
    • jp's avatar
      add Kobest (#1263) · 653217a7
      jp authored
      * Add: kobest config file
      
      * Add: kobest utils
      
      * Add: README
      
      * Update utils.py
      653217a7
  22. 11 Jan, 2024 2 commits
  23. 10 Jan, 2024 1 commit
  24. 05 Jan, 2024 1 commit
  25. 02 Jan, 2024 1 commit
  26. 29 Dec, 2023 1 commit
    • Paul McCann's avatar
      Don't silence errors when loading tasks (#1148) · 34b563b1
      Paul McCann authored
      
      
      * Add example failing task
      
      This task includes an invalid import. This will cause an exception and
      the task will not be loaded. But this just results in a DEBUG level log
      message, so in normal usage you'll see no error, and will be told the
      task doesn't exist.
      
      Here's an example command line to run the task:
      
          python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail
      
      This task is based on a Japanese Winograd task, but that's not
      important, and was just used due to familiarity.
      
      * Do not ignore errors when loading tasks
      
      * Change how task errors are logged
      
      This makes the proposed changes from PR discussion.
      
      1. Exceptions not related to missing modules/imports are logged as
         warnings.
      
      2. module/import related exceptions are still logged at debug level, but
         if any of them happen there is a warning about it with instructions
         on how to show logs.
      
      * Remove intentionally failing task
      
      ---------
      Co-authored-by: default avatarPaul O'Leary McCann <polm@dampfkraft.com>
      34b563b1
  27. 27 Dec, 2023 1 commit
    • Baber Abbasi's avatar
      nits + fix siqa (#1216) · 6a1c19ed
      Baber Abbasi authored
      * fix group
      
      * siqa: default.yml -> default.yaml
      
      * max_gen_toks -> self.max_gen_toks
      
      * add ids to task tests
      
      * fix siqa
      
      * fix gen_kwargs for openai-chat
      6a1c19ed