1. 17 Dec, 2024 1 commit
  2. 16 Dec, 2024 3 commits
    • Baber Abbasi's avatar
      fix `DeprecationWarning: invalid escape sequence '\s'` for whitespace filter (#2560) · 8d2f64c1
      Baber Abbasi authored
      * fix `DeprecationWarning: invalid escape sequence '\s'`
      
      * add type hints
      
      * Revert "add type hints"
      
      This reverts commit 15d8abc626a84e97f8c238ddfbf9e243d6f6eb5c.
      8d2f64c1
    • Baber Abbasi's avatar
      batch `loglikelihood_rolling` across requests (#2559) · 0bfb0220
      Baber Abbasi authored
      * batch all rolling token windows
      
      * nit
      
      * copy to vllm
      
      * fix max_length for `get_rolling_token_windows`
      
      * bugfix
      
      * bugfix
      
      * add type hints
      0bfb0220
    • Rima Shahbazyan's avatar
      Adding new subtask to SCORE tasks: non greedy robustness (#2558) · 976d8a0b
      Rima Shahbazyan authored
      * score readme added
      
      * generate until task's "until" parameter's default value fixed.
      
      * score mmlu-pro and agieval added
      
      * changed macro accuracy to micro for agieval
      
      * Always E removed from agi eval
      
      * redundancies removed
      
      * MATH added
      
      * minor cosmetic changes for math
      
      * Licenses added Readme updated
      
      * changes for flake8 + license header on math
      
      * Score added to readme and precommit was run.
      
      * Score added to readme and precommit was run.
      
      * Import error fixed
      
      * math task bugfix
      postprocess minor fix
      
      * CR for math added
      
      * math CR
      
      * math task bugfix
      postprocess minor fix
      
      CR for math added
      
      * Math cr fixed
      
      * mmlu_pro non_greedy task added
      
      * non greedy summarizer added
      
      * Non greedy for all score tasks
      
      * Bugfixes for non-greedy
      
      * fixing the until argument
      
      * undoing the change to "until" arguments default behaviour
      
      * minor fix in summarizer
      
      * log naming changes for better readability
      
      * math subtasks naming fix
      
      * agieval subtask naming fix
      
      * logging added for debugging
      
      * path issue fixed
      
      * minor fix
      
      * path fix
      
      * path fix
      
      * non_greedy_math minor fix
      
      * final changes
      
      * changed readme for non-greedy
      added Nvidia header
      added wxample script for non_greedy
      changed prompts to match that fo trt runs
      
      * non greedy summarizer bugfix
      
      * non_greedy summarizer fixed
      976d8a0b
  3. 14 Dec, 2024 1 commit
  4. 13 Dec, 2024 1 commit
  5. 09 Dec, 2024 2 commits
  6. 05 Dec, 2024 1 commit
  7. 04 Dec, 2024 3 commits
  8. 03 Dec, 2024 2 commits
  9. 01 Dec, 2024 1 commit
    • Yoav Katz's avatar
      Update Unitxt task to use locally installed unitxt and not download Unitxt... · 1170ef9e
      Yoav Katz authored
      
      Update Unitxt task to  use locally installed unitxt and not download Unitxt code from Huggingface (#2514)
      
      * Moved to require unitxt installation and not download unitxt from HF hub.
      
      This has performance benefits and simplifies the code.
      Signed-off-by: default avatarYoav Katz <katz@il.ibm.com>
      
      * Updated watsonx documentation
      
      * Updated installation instructions
      
      * Removed redundant comman
      
      * Allowed unitxt tasks to generate chat APIs
      
      Modified WatsonXI model to support chat apis
      
      * Removed print
      
      * Run precommit formatting
      
      ---------
      Signed-off-by: default avatarYoav Katz <katz@il.ibm.com>
      1170ef9e
  10. 30 Nov, 2024 1 commit
  11. 29 Nov, 2024 1 commit
  12. 28 Nov, 2024 1 commit
  13. 26 Nov, 2024 1 commit
    • Rima Shahbazyan's avatar
      Score tasks (#2452) · 0ef7548d
      Rima Shahbazyan authored
      * score readme added
      
      * generate until task's "until" parameter's default value fixed.
      
      * score mmlu-pro and agieval added
      
      * changed macro accuracy to micro for agieval
      
      * Always E removed from agi eval
      
      * redundancies removed
      
      * MATH added
      
      * minor cosmetic changes for math
      
      * Licenses added Readme updated
      
      * changes for flake8 + license header on math
      
      * Score added to readme and precommit was run.
      
      * Score added to readme and precommit was run.
      
      * Import error fixed
      
      * math task bugfix
      postprocess minor fix
      
      * CR for math added
      
      * math CR
      
      * math task bugfix
      postprocess minor fix
      
      CR for math added
      
      * Math cr fixed
      
      * reverting the default "until" parameter change and adjusting  score task configs
      0ef7548d
  14. 22 Nov, 2024 1 commit
  15. 20 Nov, 2024 1 commit
    • Baber Abbasi's avatar
      Nits (#2500) · 867413f8
      Baber Abbasi authored
      * fix test task
      
      * dont call lm.chat_template each time
      867413f8
  16. 18 Nov, 2024 3 commits
    • Kozzy Voudouris's avatar
      Add metabench task to LM Evaluation Harness (#2357) · 62b4364d
      Kozzy Voudouris authored
      
      
      * Add metabench (Kipnis et al. 2024)
      
      * Update metabench tasks for full replication of original benchmarks, using publicly available datasets
      
      * Remove unnecessary import
      
      * Add permute versions of each task, where the answer orders are randomly shuffled.
      
      * Add metabench group for easier evaluations
      
      * Fix mmlu counts after removing duplicate
      
      * Add secondary datasets
      
      * Fix f-string error
      
      * Fix f-string error for permute processing
      
      * Add original hash to outputs for easy matching to original results
      
      * Add line break at end of utils files
      
      * Remove extra line from winogrande
      
      * Reformat for linters
      
      * fix multiple input test
      
      * appease pre-commit
      
      * Add metabench to tasks README
      
      * fix multiple input `test_doc_to_text`
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      62b4364d
    • Baber Abbasi's avatar
      remove duplicate `arc_ca` (#2499) · 8222ad0a
      Baber Abbasi authored
      8222ad0a
    • Baber Abbasi's avatar
      Add mamba hf to `mamba_ssm` (#2496) · 0f5dc265
      Baber Abbasi authored
      * add hf mamba to mamba_lm
      
      * fix _model_generate for hf
      0f5dc265
  17. 16 Nov, 2024 2 commits
    • Wonseok Hwang's avatar
      kbl-v0.1.1 (#2493) · cbc31eb8
      Wonseok Hwang authored
      * release kbl-v0.1
      
      * fix linting
      
      * remove rag tasks as  doc_to_text functions cause trouble
      
      * remove remaining rag tasks
      
      * remove unnecessary repeat in yaml files and rag dataset in hf-hub
      
      * remove unncessary newline; introduce cfg files in lbox/kbl in hf
      
      * Make task yaml files consistent to hf-datasets-config
      
      * Make task yaml files consistent to hf-datasets-config
      
      * Remove trailing empty space in doc-to-text
      
      * Remove unncessary yaml file
      
      * Fix task nameing error
      
      * trailing space removed
      cbc31eb8
    • Baber Abbasi's avatar
      update pre-commit hooks and git actions (#2497) · badf273a
      Baber Abbasi authored
      * pre-commit update
      
      * update github actions
      
      * make logging less verbose
      
      * fix artifacts
      badf273a
  18. 15 Nov, 2024 2 commits
  19. 12 Nov, 2024 1 commit
  20. 11 Nov, 2024 2 commits
  21. 09 Nov, 2024 2 commits
  22. 07 Nov, 2024 3 commits
  23. 06 Nov, 2024 1 commit
  24. 05 Nov, 2024 3 commits
    • mtkachenko's avatar
      Add Japanese Leaderboard (#2439) · 26f607f5
      mtkachenko authored
      * add jaqket_v2 and jcommonsenseqa
      
      * remove comments
      
      * remove num_beams as it is incompatible with vllm
      
      * add jnli + refactor
      
      * rename jnla -> jnli
      
      * add jsquad + replace colon chars with the Japanese unicode
      
      * ignore whitespaces in generation tasks
      
      * add marc_ja
      
      * add xwinograd + simplify other yamls
      
      * add mgsm and xlsum
      
      * refactor xlsum
      
      * add ja_leaderboard tag
      
      * edit README.md
      
      * update README.md
      
      * add credit + minor changes
      
      * run ruff format
      
      * address review comments + add group
      
      * remove aggregate_metric_list
      
      * remove tags
      
      * update tasks/README.md
      26f607f5
    • zxcvuser's avatar
      Modify label errors in catcola and paws-x (#2434) · fb2e4b59
      zxcvuser authored
      
      
      * Modify label errors in catcola and paws
      
      * Update version to 1.0 in pawsx_template_yaml
      
      * add changelog
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      fb2e4b59
    • Sypherd's avatar
      Add real process_docs example (#2456) · 0b8358ec
      Sypherd authored
      0b8358ec