1. 28 Jan, 2025 3 commits
    • Baber Abbasi's avatar
      fix multiple input chat tempalte (#2576) · 96e499ba
      Baber Abbasi authored
      * feat: drop Python 3.8 support
      
      * feat: drop Python 3.8 tests
      
      * pre-commit
      
      * handle chat_template for multiple iput
      96e499ba
    • Nicky Pochinkov's avatar
      add TransformerLens example (#2651) · 42f79131
      Nicky Pochinkov authored
      * add TransformerLens example
      
      Many people use TransformerLens to do interpretability and interventions on models, and then need to test the model.
      
      Here is a simple script that allows one to pass in the TransformerLens model and run evaluations on it.
      
      * Ran pre-commit checks
      42f79131
    • Irina Proskurina's avatar
      Add Moral Stories (#2653) · a0466f01
      Irina Proskurina authored
      * Add moral stories task
      
      * Add moral stories task
      
      * Create README.md
      
      * Update README.md
      
      * Update line endings in moral_stories files
      a0466f01
  2. 24 Jan, 2025 1 commit
  3. 21 Jan, 2025 3 commits
  4. 20 Jan, 2025 6 commits
  5. 19 Jan, 2025 1 commit
  6. 17 Jan, 2025 1 commit
  7. 15 Jan, 2025 4 commits
    • Baber Abbasi's avatar
      assistant prefill (#2615) · 703fbffd
      Baber Abbasi authored
      * add assistant prefix
      
      * add arc_challenge from llama
      
      * nit
      
      * nit
      
      * nit
      
      * add assistant prefix
      
      * add mmlu_llama
      
      * nit
      
      * nit
      
      * Revert "nit"
      
      This reverts commit 6a97f8356237305e375212b966b30e8de59dd4bc.
      
      * fix regex bug
      
      * add assistant_prefix to vllm
      
      * add `Question:`
      
      * add mmlu_pro
      
      * add fewshot assistant_prefix
      
      * use `assistant_prefill`
      
      * typehints
      
      * nits
      
      * nits
      
      * add to docs
      
      * add readme
      703fbffd
    • Shivansh Pachnanda's avatar
      Add MLQA (#2622) · e86cece6
      Shivansh Pachnanda authored
      * Add MLQA
      * add mlqa_common_yaml
      
      * add 49 tests of mlqa family
      
      * update tasks/README.md
      
      ---------
      
      * fix: mlqa ast error
      
      * nit: removed .yaml ext from template_yaml
      
      * nit changes: minor modifications generate_tasks.py
      
      * deleted    lm_eval/tasks/mlqa/mlqa_common_yaml.yaml
      
      * tests updated
      
      * nit
      e86cece6
    • Hojin Lee's avatar
      Add MBPP (#2247) · 5db23e2c
      Hojin Lee authored
      
      
      * add mbpp
      
      * fix some bugs
      
      * add README for mbpp
      
      * update README
      
      * nits
      
      ---------
      Co-authored-by: default avatarHojin Lee <19949034+hjlee1371@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      5db23e2c
    • Hojin Lee's avatar
      Add HumanEval (#1992) · 4c11206b
      Hojin Lee authored
      
      
      * add custom filter
      
      * fix type casting of references
      
      * add humaneval
      
      * fix a bug in humaneval
      
      * add greedy version of humaneval
      
      * update tasks README
      
      * test humaneval
      
      * return multiple metrics
      
      * nit
      
      * add confirmation to run code tasks
      
      * nit
      
      * nit
      
      ---------
      Co-authored-by: default avatarHojin Lee <19949034+hjlee1371@users.noreply.github.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      4c11206b
  8. 07 Jan, 2025 3 commits
  9. 04 Jan, 2025 1 commit
    • Baber Abbasi's avatar
      some minor logging nits (#2609) · 888ac292
      Baber Abbasi authored
      * remove yaml extension from phraes_va_common
      
      * remove yaml extension from winogenerated
      
      * remove yaml extension from phrases_es
      
      * no cache debug logging when not used
      888ac292
  10. 02 Jan, 2025 1 commit
    • Baber Abbasi's avatar
      update scrolls (#2602) · 1044db95
      Baber Abbasi authored
      * update evaluate; update construct requests
      
      * update construct requests to handle `apply_chat_template` kwarg
      1044db95
  11. 30 Dec, 2024 1 commit
  12. 25 Dec, 2024 1 commit
  13. 24 Dec, 2024 1 commit
    • Firoj Alam, Scientist, QCRI's avatar
      AraDICE task config file (#2507) · 932e8f9e
      Firoj Alam, Scientist, QCRI authored
      
      
      * added aradice
      
      * Added ArabicMMLU Lev Configs
      
      * added ArabicMMLU egy configs
      
      * Added boolq configs
      
      * Added cultural bench configs
      
      * added openbookqa configs
      
      * Added PiQA configs
      
      * added winogrande configs
      
      * Added truthfulQA configs
      
      * Added aradice group config
      
      * Remove deleted files from repository
      
      * modified arabimmlu configs
      
      * modified metadata versions
      
      * fixed formatting using ruff
      
      * added aradice tasks information
      
      * pre-commit
      
      * Uptaded openbookqa utils
      
      * fixed formatting on obqa
      
      ---------
      Co-authored-by: default avatarBasel Mousi <bmousi@hbku.edu.qa>
      Co-authored-by: default avatarBaber <baber@hey.com>
      932e8f9e
  14. 20 Dec, 2024 1 commit
  15. 19 Dec, 2024 2 commits
  16. 17 Dec, 2024 2 commits
  17. 16 Dec, 2024 3 commits
    • Baber Abbasi's avatar
      fix `DeprecationWarning: invalid escape sequence '\s'` for whitespace filter (#2560) · 8d2f64c1
      Baber Abbasi authored
      * fix `DeprecationWarning: invalid escape sequence '\s'`
      
      * add type hints
      
      * Revert "add type hints"
      
      This reverts commit 15d8abc626a84e97f8c238ddfbf9e243d6f6eb5c.
      8d2f64c1
    • Baber Abbasi's avatar
      batch `loglikelihood_rolling` across requests (#2559) · 0bfb0220
      Baber Abbasi authored
      * batch all rolling token windows
      
      * nit
      
      * copy to vllm
      
      * fix max_length for `get_rolling_token_windows`
      
      * bugfix
      
      * bugfix
      
      * add type hints
      0bfb0220
    • Rima Shahbazyan's avatar
      Adding new subtask to SCORE tasks: non greedy robustness (#2558) · 976d8a0b
      Rima Shahbazyan authored
      * score readme added
      
      * generate until task's "until" parameter's default value fixed.
      
      * score mmlu-pro and agieval added
      
      * changed macro accuracy to micro for agieval
      
      * Always E removed from agi eval
      
      * redundancies removed
      
      * MATH added
      
      * minor cosmetic changes for math
      
      * Licenses added Readme updated
      
      * changes for flake8 + license header on math
      
      * Score added to readme and precommit was run.
      
      * Score added to readme and precommit was run.
      
      * Import error fixed
      
      * math task bugfix
      postprocess minor fix
      
      * CR for math added
      
      * math CR
      
      * math task bugfix
      postprocess minor fix
      
      CR for math added
      
      * Math cr fixed
      
      * mmlu_pro non_greedy task added
      
      * non greedy summarizer added
      
      * Non greedy for all score tasks
      
      * Bugfixes for non-greedy
      
      * fixing the until argument
      
      * undoing the change to "until" arguments default behaviour
      
      * minor fix in summarizer
      
      * log naming changes for better readability
      
      * math subtasks naming fix
      
      * agieval subtask naming fix
      
      * logging added for debugging
      
      * path issue fixed
      
      * minor fix
      
      * path fix
      
      * path fix
      
      * non_greedy_math minor fix
      
      * final changes
      
      * changed readme for non-greedy
      added Nvidia header
      added wxample script for non_greedy
      changed prompts to match that fo trt runs
      
      * non greedy summarizer bugfix
      
      * non_greedy summarizer fixed
      976d8a0b
  18. 14 Dec, 2024 1 commit
  19. 13 Dec, 2024 1 commit
  20. 09 Dec, 2024 2 commits
  21. 05 Dec, 2024 1 commit