1. 22 May, 2024 1 commit
  2. 21 May, 2024 1 commit
  3. 13 May, 2024 1 commit
  4. 09 May, 2024 1 commit
    • Edd's avatar
      Copal task (#1803) · 1980a13c
      Edd authored
      * add copal
      
      * change name to copal id for clarity and the task name
      
      * remove `copal_id...` to yaml to make it work
      
      * checkmark on README
      
      * change group name to `copal_id`
      1980a13c
  5. 08 May, 2024 1 commit
  6. 07 May, 2024 3 commits
    • Yoav Katz's avatar
      Initial integration of the Unitxt to LM eval harness (#1615) · 885f48d6
      Yoav Katz authored
      * Initial support for Unitxt datasets in LM Eval Harness
      
      See  https://github.com/IBM/unitxt
      
      
      
      The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file.
      
      The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'.
      
      * Added dataset loading check to generate_yaml
      
      Improved error messages.
      
      * Speed up generate_yaml
      
      Added printouts and improved error message
      
      * Added output printout
      
      * Simplified integration of unitxt datasets
      
      Store all the common yaml configuration in a yaml include shared by all datasets of the same task.
      
      * Post code review comments - part 1
      
      1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks
      2. Added more datasets and tasks (NER, GEC)
      3. Added README
      
      * Post code review comments - part 2
      
      1. Added install unitxt install option in pyproject.toml:
      pip install 'lm_eval[unitxt]'
      2. Added a check that unitxt is installed and print a clear error message if not
      
      * Commited missing pyproject change
      
      * Added documentation on adding datasets
      
      * More doc changes
      
      * add unitxt extra to readme
      
      * run precommit
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      885f48d6
    • Hailey Schoelkopf's avatar
      Re-add Hendrycks MATH (no sympy checking, no Minerva hardcoded prompt) variant (#1793) · d42a3e44
      Hailey Schoelkopf authored
      * add Hendrycks MATH (no sympy checking) variant
      
      * add readmes for MATH tasks
      d42a3e44
    • Hailey Schoelkopf's avatar
  7. 01 May, 2024 4 commits
  8. 26 Apr, 2024 1 commit
  9. 25 Apr, 2024 2 commits
  10. 18 Apr, 2024 1 commit
  11. 05 Apr, 2024 1 commit
    • ZoneTwelve's avatar
      TMMLU+ implementation (#1394) · 9ae96cdf
      ZoneTwelve authored
      
      
      * implementation of TMMLU+
      
      * implemented: TMMLU+
      
      ****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding****
      
      - 4 categories
          - STEM
          - Social Science
          - Humanities
          - Other
      
      The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models.
      
      ```markdown
      Total number of tasks in the 'test' sets: 20160
      Total number of tasks in the 'validation' sets: 2247
      Total number of tasks in the 'train' sets: 335
      ```
      
      * Remove print from __init__.py
      
      There was my mistake in forgetting to remove the debug print from the code.
      
      * update: move TMMLU+ config generation program into default
      
      * fix: we should use training set as few shots example
      
      * update: README for TMMLU+
      
      * update: a small changes of TMMLU+ README file
      
      * pre-commit run thought
      
      * Add README for TMMLU+ dataset
      
      * run precommit
      
      * trigger precommit again
      
      * trigger precommit again
      
      * isort is fussy
      
      * isort is fussy
      
      * format, again
      
      * oops
      
      * oops
      
      ---------
      Co-authored-by: default avatarlintang <lintang@eleuther.ai>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      9ae96cdf
  12. 04 Apr, 2024 1 commit
  13. 01 Apr, 2024 1 commit
  14. 28 Mar, 2024 1 commit
  15. 21 Mar, 2024 1 commit
  16. 18 Mar, 2024 2 commits
  17. 15 Mar, 2024 1 commit
  18. 13 Mar, 2024 1 commit
  19. 11 Mar, 2024 4 commits
  20. 09 Mar, 2024 1 commit
  21. 06 Mar, 2024 5 commits
  22. 05 Mar, 2024 2 commits
  23. 04 Mar, 2024 1 commit
  24. 03 Mar, 2024 1 commit
  25. 01 Mar, 2024 1 commit