1. 15 Jul, 2024 1 commit
  2. 12 Jul, 2024 1 commit
    • Jess's avatar
      Irokobench: Benchmark Dataset for African languages (#2042) · 383bbd54
      Jess authored
      
      
      * add afrixnli to task
      
      * add chat completion
      
      * remove chat completion -untested
      
      * afrimmlu added
      
      * afrimmlu folder update
      
      * afrimmlu folder update
      
      * updated prompt
      
      * remove print
      
      * add afrimgsm -direct
      
      * add squad metric
      
      * fix bash script
      
      * remove direct util, update common yaml
      
      * remove print
      
      * add few show. metric fixes
      
      * fix direct path, add bash script for gpt models
      
      * added transate test
      
      * update afrixnli tasks
      
      * update afrixnli tasks
      
      * update metrics for afrixnli
      
      * prompt translations fix
      
      * prompt translations fix
      
      * filter and metric fix -mgsm
      
      * remove squad metric
      
      * remove squad metric
      
      * add f1 score to mgsm
      
      * add f1 score to mgsm
      
      * update native-direct with lin
      
      * change f1 function
      
      * add lin to utils
      
      * add utils
      
      * remove test limit
      
      * remove test configs
      
      * add swahili to mmlu
      
      * change eng to ewe in ewe yaml mmlu
      
      * add squad metric to mgsm, remove whitespace filter
      
      * added translate test
      
      * added afrixnli_translate
      
      * fix exact match valueError
      
      * fix exact match valueError
      
      * restructure mmlu folder
      
      * spacing
      
      * remove afrimmlu_translate folder
      
      * add utility
      
      * format task name, clean ups
      
      * modefied mgsm
      
      * update on afrimgsm
      
      * update on afrimgsm
      
      * removed utils
      
      * other mgsm varieties
      
      * other mgsm varieties
      
      * adding trasnslate direct
      
      * Update translate_direct_yaml
      
      * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model
      
      * edit for open models
      
      * Update translate_direct_yaml
      
      * add verbalizer for xnli
      
      * change xnli from multiple choice to generate
      
      * add manual accuracy scores
      
      * revert xnli to multiple choice
      
      * change afrimgsm utils
      
      * revert xnli to multiple_choice
      
      * cleanups and readmes
      
      * remove openai fixes and unused regex
      
      * pr review changes
      
      * revert metrics.py, task.py and extraction.py to main version
      
      ---------
      Co-authored-by: default avatarIsrael Abebe Azime <azime@cg.uni-saarland.de>
      Co-authored-by: default avatarIsrael Abebe Azime <se.israel.abebe@gmail.com>
      383bbd54
  3. 01 Jul, 2024 1 commit
  4. 24 May, 2024 2 commits
  5. 26 Feb, 2024 1 commit
    • Lintang Sutawika's avatar
      Cont metrics (#1475) · 96d185fa
      Lintang Sutawika authored
      
      
      * add brier_score
      
      * process brier_score
      
      * brier score is working for N-sized class
      
      * fxied brier score
      
      * add TED to BigBench and Brier score to MMLU
      
      * format
      
      * Update metrics.py
      
      * Update task.py
      
      * Update generate_until_template_yaml
      
      * Delete lm_eval/tasks/bigbench/aux_metric.py
      
      * Update generate_until_template_yaml
      
      * Update _default_template_yaml
      
      * Update _generate_configs.py
      
      * Update _generate_configs.py
      
      * Update _generate_configs.py
      
      * fix (format?)
      
      * format?
      
      * format, once more
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      96d185fa
  6. 20 Feb, 2024 1 commit
  7. 13 Feb, 2024 1 commit
  8. 06 Feb, 2024 1 commit
  9. 31 Jan, 2024 1 commit
    • Baber Abbasi's avatar
      add bypass metric (#1156) · f8203de1
      Baber Abbasi authored
      * add bypass metric
      
      * fixed `bypass` metric.
      
      * add task attributes if predict_only
      
      * add `predict_only` checks
      
      * add docs
      
      * added `overide_metric`, `override_config` to `Task`
      
      * nits
      
      * nit
      
      * changed --predict_only to generations; nits
      
      * nits
      
      * nits
      
      * change gen_kwargs warning
      
      * add note about `--predict_only` in README.md
      
      * added `predict_only`
      
      * move table to bottom
      
      * nit
      
      * change null aggregation to bypass (conflict)
      
      * bugfix; default `temp=0.0`
      
      * typo
      f8203de1
  10. 20 Dec, 2023 1 commit
    • Baber Abbasi's avatar
      Switch Linting to `ruff` (#1166) · 65b8761d
      Baber Abbasi authored
      * add ruff and isort. remove black and flake8
      
      * remove unnecessary dependencies
      
      * remove dependency from table
      
      * change order
      
      * ran ruff
      
      * check 3.9
      
      * exclude evaluator
      
      * update CI workflow
      
      * use ruff config in pyproject.toml
      
      * test
      
      * add isort rules to ruff
      
      * sort imports
      
      * import `make_table`
      
      * try stages for no-commit-to-branch
      
      * turn on mypy for pre-commit
      
      * test
      
      * test
      
      * test
      
      * change no-commit-to-branch to default
      
      * nits
      
      * fixed dependency
      65b8761d
  11. 02 Nov, 2023 2 commits
  12. 19 Oct, 2023 3 commits
  13. 18 Oct, 2023 1 commit
  14. 25 Aug, 2023 1 commit
  15. 14 Aug, 2023 5 commits
  16. 12 Aug, 2023 1 commit
  17. 11 Aug, 2023 1 commit
  18. 03 Aug, 2023 1 commit
  19. 02 Aug, 2023 4 commits
  20. 06 Jul, 2023 1 commit
  21. 15 Jun, 2023 1 commit
  22. 13 Jun, 2023 1 commit
  23. 12 Jun, 2023 1 commit
    • Hailey Schoelkopf's avatar
      [Refactor] [WIP] New YAML advanced docs (#567) · 79b972d6
      Hailey Schoelkopf authored
      
      
      * add wip gsm8k yaml
      
      * cleanup tasks dir
      
      * push gsm8k yaml changes
      
      * rename gpt2.py
      
      * add updated gsm8k , triviaqa baseline
      
      * add new cot yaml
      
      * allow for multiple filter pipelines, new filter types
      
      * updated gsm8k + sampling gen configs
      
      * cleanup self-consistency yaml
      
      * push outline for advanced docs
      
      * push docs checklist
      
      * switch to inheritance for many tasks
      
      * acc_norm and acc_mutual_info fixed
      
      * fix missing newline in error msg
      
      * remove many .py tasks
      
      * updated GSM8k
      
      * added more doc
      
      * Update advanced_task_guide.md
      
      Added list of parameters
      
      * Update advanced_task_guide.md
      
      * Added details on listing metrics
      
      * Update advanced_task_guide.md
      
      * Added more explanation
      
      * modify current default filter name
      
      * add new tags to tasks
      
      * remove a lingering print()
      
      * add rest of param docs, cleanup deprecated fields
      
      * push docs update
      
      * move ALL_TASKS definition location
      
      * confirm write_out.py works if no description dict passed
      
      ---------
      Co-authored-by: default avatarlintangsutawika <lintang@sutawika.com>
      79b972d6
  24. 07 Jun, 2023 2 commits
  25. 06 Jun, 2023 1 commit
  26. 19 May, 2023 1 commit
  27. 10 May, 2023 1 commit
  28. 08 May, 2023 1 commit