1. 07 May, 2024 1 commit
  2. 06 May, 2024 1 commit
  3. 25 Apr, 2024 5 commits
  4. 24 Apr, 2024 1 commit
  5. 23 Apr, 2024 1 commit
  6. 18 Apr, 2024 1 commit
  7. 05 Apr, 2024 1 commit
    • ZoneTwelve's avatar
      TMMLU+ implementation (#1394) · 9ae96cdf
      ZoneTwelve authored
      
      
      * implementation of TMMLU+
      
      * implemented: TMMLU+
      
      ****TMMLU+ : large-scale Traditional chinese Massive Multitask language Understanding****
      
      - 4 categories
          - STEM
          - Social Science
          - Humanities
          - Other
      
      The TMMLU+ dataset, encompassing over 67 subjects and 20160 tasks, is six times larger and more balanced than its predecessor, TMMLU, and includes benchmark results from both closed-source and 20 open-weight Chinese large language models with 1.8B to 72B parameters. However, Traditional Chinese variants continue to underperform compared to major Simplified Chinese models.
      
      ```markdown
      Total number of tasks in the 'test' sets: 20160
      Total number of tasks in the 'validation' sets: 2247
      Total number of tasks in the 'train' sets: 335
      ```
      
      * Remove print from __init__.py
      
      There was my mistake in forgetting to remove the debug print from the code.
      
      * update: move TMMLU+ config generation program into default
      
      * fix: we should use training set as few shots example
      
      * update: README for TMMLU+
      
      * update: a small changes of TMMLU+ README file
      
      * pre-commit run thought
      
      * Add README for TMMLU+ dataset
      
      * run precommit
      
      * trigger precommit again
      
      * trigger precommit again
      
      * isort is fussy
      
      * isort is fussy
      
      * format, again
      
      * oops
      
      * oops
      
      ---------
      Co-authored-by: default avatarlintang <lintang@eleuther.ai>
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      9ae96cdf
  8. 04 Apr, 2024 1 commit
  9. 01 Apr, 2024 1 commit
  10. 28 Mar, 2024 1 commit
  11. 21 Mar, 2024 1 commit
  12. 18 Mar, 2024 2 commits
  13. 15 Mar, 2024 1 commit
  14. 13 Mar, 2024 1 commit
  15. 11 Mar, 2024 4 commits
  16. 09 Mar, 2024 1 commit
  17. 06 Mar, 2024 5 commits
  18. 05 Mar, 2024 2 commits
  19. 04 Mar, 2024 1 commit
  20. 03 Mar, 2024 1 commit
  21. 01 Mar, 2024 1 commit
  22. 27 Feb, 2024 2 commits
  23. 26 Feb, 2024 4 commits
    • Lintang Sutawika's avatar
      Cont metrics (#1475) · 96d185fa
      Lintang Sutawika authored
      
      
      * add brier_score
      
      * process brier_score
      
      * brier score is working for N-sized class
      
      * fxied brier score
      
      * add TED to BigBench and Brier score to MMLU
      
      * format
      
      * Update metrics.py
      
      * Update task.py
      
      * Update generate_until_template_yaml
      
      * Delete lm_eval/tasks/bigbench/aux_metric.py
      
      * Update generate_until_template_yaml
      
      * Update _default_template_yaml
      
      * Update _generate_configs.py
      
      * Update _generate_configs.py
      
      * Update _generate_configs.py
      
      * fix (format?)
      
      * format?
      
      * format, once more
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      96d185fa
    • Aaron V's avatar
      Create a means for caching task registration and request building. Ad… (#1372) · 1e6c9272
      Aaron V authored
      
      
      * Create a means for caching task registration and request building. Add the ability to specify an args dict for simple_evaluate().
      
      * Remove extra S in cache path in caching module
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * Rename requests cache args, make model_args polymorphic so that a dict can also be accepted.
      
      * Update docs to reflect new caching behavior, add CLI args for requests caching. Create a function for deleting items in the cache.
      
      * Update documentation, fix minor bug with arg parsing for requests caching where an undefined variable was used.
      
      * Remove line from gitignore, add to cli for caching datasets.
      
      * Add hashing suffix to .pickles. Update test script typo.
      
      * Favor isinstance() over type() in evaluator.py
      
      * Add tests for caching, gets tests working, remove unneeded arg from build_all_requests().
      
      * Update arg description to simple_evaluate.
      
      * Update pyproject.toml
      
      * Fix typehint
      
      * Remove the use of random() for creating default cache pickle hash.
      
      * Check that cache dir exists before clearing it in request cache tests.
      
      * Fix linting problems.
      
      * Fix additional formatting errors.
      
      * Remove trailing whitespace.
      
      * Add new line to the end of .gitignore.
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      1e6c9272
    • Hailey Schoelkopf's avatar
      Revert "setting trust_remote_code (#1467)" (#1474) · f6befdb9
      Hailey Schoelkopf authored
      This reverts commit c1145dfd.
      f6befdb9
    • khalil's avatar
      add arabic mmlu (#1402) · 7de7b27e
      khalil authored
      * add arabic mmlu
      
      * update the description
      
      * add readme file
      7de7b27e