1. 15 Aug, 2024 2 commits
  2. 14 Jul, 2024 1 commit
  3. 12 Jul, 2024 5 commits
  4. 11 Jul, 2024 1 commit
    • anthony-dipofi's avatar
      Prettify lm_eval --tasks list (#1929) · a0243d54
      anthony-dipofi authored
      
      
      * add  and ; move task list newline logic to new TaskManager.list_all_tasks() method
      
      * format table list into markdown table; add config location column
      
      * add Output Type column
      
      * add logic for printing table of tags separately
      
      * merge with main and fix conflicts ; update docstrings
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      a0243d54
  5. 10 Jul, 2024 2 commits
    • meg's avatar
      batch_size may be str if 'auto' is specified (#2084) · 30273b47
      meg authored
      30273b47
    • Lintang Sutawika's avatar
      Update utils.py (#2085) · 058cfd0e
      Lintang Sutawika authored
      Group Configs with no aggregation will print a empty space as the score for result table.
      Example
      ```
      |    Tasks     |Version|Filter|n-shot| Metric |   |Value |   |Stderr|
      |--------------|-------|------|-----:|--------|---|-----:|---|-----:|
      |group         |    N/A|      |      |        |   |      |   |      |
      | - task 0     |Yaml   |none  |     0|acc     |↑  |0.4000|±  |0.0910|
      | - task 1     |Yaml   |none  |     0|acc     |↑  |0.3333|±  |0.0875|
      | - task 2     |Yaml   |none  |     0|acc     |↑  |0.2667|±  |0.0821|
      | - task 3     |Yaml   |none  |     0|acc     |↑  |0.3333|±  |0.0875|
      ```
      
      So the `v` variable in the `make_table` needs to check if the value is a float or a string.
      058cfd0e
  6. 09 Jul, 2024 1 commit
  7. 08 Jul, 2024 6 commits
    • Pankaj Mathur's avatar
      Minor doc fix: leaderboard README.md missing mmlu-pro group and task (#2075) · be01651c
      Pankaj Mathur authored
      leaderboard README.md missing mmlu-pro group and task
      be01651c
    • Nathan Habib's avatar
      Allow gating EvaluationTracker HF Hub results; customizability (#2051) · 563f7971
      Nathan Habib authored
      * batch commit
      
      * :Revert "batch commit"
      
      This reverts commit d859d1ca.
      
      * batch commit
      
      * checkout from main
      
      * checkout from main
      
      * checkout from main
      
      * checkout from main
      
      * checkout from main
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * cleanup
      
      * cleanup eval results
      
      * cleanup
      
      * add check for gated repo
      
      * fix jsonline issue
      
      * fix
      
      * add try catch when gating the details repo
      
      * add doc
      
      * adds back hub_repo_name
      
      * readds hub repo name
      563f7971
    • Elron Bandel's avatar
      Easier unitxt tasks loading and removal of unitxt library dependancy (#1933) · ad80f555
      Elron Bandel authored
      
      
      * Updated unitxt loading
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      
      * Revert change to general Readme
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      
      * Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      
      * Fix scrolls
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      * Update documentation
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      * Enforce backward compatability
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      * Format unitxt class
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      
      ---------
      Signed-off-by: default avatarElron Bandel <elron.bandel@ibm.com>
      Signed-off-by: default avatarelronbandel <elron.bandel@ibm.com>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      ad80f555
    • Hailey Schoelkopf's avatar
    • Lintang Sutawika's avatar
      Group agg rework (#1741) · 517aadc4
      Lintang Sutawika authored
      
      
      * add greoup_config arg
      
      * add a group config that allows disabling table for group score and group aggregate in general
      
      * fixed size configuration
      
      * adjust config
      
      * add group config
      
      * adjust mmlu to use group_config
      
      * fixed args input in aggregate_subtask_metrics
      
      * fixed issues related to printing alias of group and updated yaml
      
      * update all mmlu variants to include group_config
      
      * edit format
      
      * modify mmlu tasks
      
      * adjust group to also be a configurable group
      
      * add configurable group
      
      * simplify get_task_list
      
      * adjust group scoring with using ConfigurableGroup
      
      * adjust args
      
      * update mmlu
      
      * update mmlu
      
      * update to work with new group and task configuration
      
      * readd group_agg
      
      * readd files
      
      * move prepare_print_tasks to evaluator_utils
      
      * sort set to False by default, fix predict_only arg
      
      * add version for groups
      
      * reversed task list
      
      * update additional condition when loading a group in a group yaml
      
      * update truthfulqa
      
      * add description regarding tags replacing group
      
      * replace group to tag
      
      * fixed conditional statement
      
      * remove warning
      
      * update loading of task group and newly added tags
      
      * reformat with pre-commit
      
      * fixed info log
      
      * update
      
      * fix bug
      
      * fix bug
      
      * use task id to differentiate tasks
      
      * convert all groups to configurable groups
      
      * use task_id
      
      * reformat
      
      * add task_id for python tasks as well
      
      * add task_id for python tasks as well
      
      * add task_id for python tasks as well
      
      * revert truthfulqa
      
      * revert mmlu tasks
      
      * new mmlu config
      
      * new group config parameter `tag_to_task`
      
      * Update truthfulqa_mc2.yaml
      
      * reformate
      
      * add _process_group_config
      
      * adjust task_id
      
      * add get_subtask_list function to get proper subtask list
      
      * group config to_dict update
      
      * remove tag check
      
      * update mmlu
      
      * fix config passing issues
      
      * add test yaml
      
      * format fix
      
      * add documentation
      
      * corner case for single tag being called
      
      * fix indentation
      
      * formatting
      
      * update all mmlu variants
      
      * Update docs/task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * remove group_alias
      
      * Update docs/task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * remove version for metadata
      
      * Update docs/task_guide.md
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * update mmlu/
      
      * removed " " in make_table
      
      * change how aggregate_metric is loaded
      
      * change how aggregate_metric is loaded
      
      * update aggregate_metric arg
      
      * update format
      
      * update format
      
      * some docs fixes
      
      * add groups for agieval, aexams, aclue
      
      * add more explicit aggregation groups
      
      * add more groupings / tags distinctions
      
      * add more groupings
      
      * more groupings
      
      * add many explicit group configs
      
      * add many explicit group configs
      
      * add more explicit group configs
      
      * add more explicit group configs
      
      * add more error msgs, agg_metric -> agg_metric_list
      
      * some docs updates
      
      * update task_id to be updateable and uses group:task format
      
      * make KMMLU a tag for now
      
      * update docs
      
      * don't duplicate task names
      
      * fix merge conflicts?
      
      * giving this a try
      
      * clean up diff
      
      * switch mmlu variants over to using
      
      * don't use to-be-deprecated group: config field in overview notebook
      
      * Python tasks which subclass ConfigurableTask now run
      
      * update mmlu
      
      * pre-commit format
      
      * fixed sorting for multi-level printing
      
      * move group api to separate file
      
      * fix bbh aggregation filter usage
      
      * track api/group.py
      
      * adjust group and tags loading
      
      * make explicit group configs for leaderboard and other newer tasks
      
      * fix arabicmmlu
      
      * update
      
      * change arabicmmlu template name???
      
      * update group alias
      
      * fix printing bugs
      
      * check table printing is correct ; update tests
      
      * use mmlu_stem to have a group included in print tests
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      517aadc4
    • Choyunhui's avatar
      5a7ed3ee
  8. 03 Jul, 2024 3 commits
  9. 02 Jul, 2024 1 commit
  10. 01 Jul, 2024 4 commits
  11. 29 Jun, 2024 1 commit
  12. 28 Jun, 2024 3 commits
    • Baber Abbasi's avatar
      Add chat template to `vllm` (#2034) · cc2d3463
      Baber Abbasi authored
      
      
      * add chat template
      
      * refactor token padding
      
      * nit
      
      * nit
      
      * check on failing test
      
      * check transformers version
      
      * remove transformers pin
      
      * add ids to test
      
      * nit
      
      * fixup
      
      * fix bos bug
      
      * nit
      
      * fixup! fix bos bug
      
      * increase tolerance for table test
      
      * don't detokenize vllm logprobs
      
      * Update lm_eval/models/utils.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * pre-commit run --all-files
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      cc2d3463
    • Baber Abbasi's avatar
      fix cache (#2037) · e922cceb
      Baber Abbasi authored
      e922cceb
    • Steven Basart's avatar
      Fixes scrolls task bug with few_shot examples (#2003) · 801322e0
      Steven Basart authored
      Bug:
      
      ```
      python -m scripts.write_out --task scrolls_quality --output_base_path ~/workspace/
      Traceback (most recent call last):
        File "<frozen runpy>", line 198, in _run_module_as_main
        File "<frozen runpy>", line 88, in _run_code
        File "/lm-evaluation-harness/scripts/write_out.py", line 92, in <module>
          main()
        File "/lm-evaluation-harness/scripts/write_out.py", line 51, in main
          task_dict = tasks.get_task_dict(task_names, task_manager)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 423, in get_task_dict
          task_name_from_string_dict = task_manager.load_task_or_group(
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 271, in load_task_or_group
          collections.ChainMap(*map(self._load_individual_task_or_group, task_list))
        File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 162, in _load_individual_task_or_group
          return load_task(task_config, task=name_or_config, group=parent_name)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/lm-evaluation-harness/lm_eval/tasks/__init__.py", line 148, in load_task
          task_object = config["class"]()
                        ^^^^^^^^^^^^^^^^^
        File "/lm-evaluation-harness/lm_eval/tasks/scrolls/task.py", line 120, in __init__
          super().__init__()
        File "/lm-evaluation-harness/lm_eval/api/task.py", line 703, in __init__
          self._config = TaskConfig(**config)
                         ^^^^^^^^^^^^^^^^^^^^
      TypeError: lm_eval.api.task.TaskConfig() argument after ** must be a mapping, not NoneType
      ```
      801322e0
  13. 26 Jun, 2024 1 commit
  14. 25 Jun, 2024 5 commits
  15. 24 Jun, 2024 2 commits
  16. 20 Jun, 2024 1 commit
  17. 19 Jun, 2024 1 commit