1. 03 Jun, 2025 1 commit
    • Baber Abbasi's avatar
      add Mbpp instruct (#2995) · 60e85da5
      Baber Abbasi authored
      * feat: add mbpp_instruct
      
      * fix: update generation_kwargs to use an empty until list
      
      * fix: correct predictions formatting in pass_at_1 function
      
      * fix: improve code block extraction by checking first without opening backticks
      
      * fix mbpp `pass_at_1`
      60e85da5
  2. 26 May, 2025 1 commit
  3. 21 May, 2025 1 commit
  4. 19 May, 2025 2 commits
  5. 15 May, 2025 4 commits
    • Baber Abbasi's avatar
      fix formatting (#2759) · 0126f6d1
      Baber Abbasi authored
      0126f6d1
    • tawsif's avatar
      Update utils.py (#2870) · 2bde99e4
      tawsif authored
      2bde99e4
    • Yufeng Xu's avatar
      Added C4 Support (#2889) · 86a3b270
      Yufeng Xu authored
      * added c4 dataset (working)
      
      * fixed bugs in c4
      
      * fixed loading bugs in c4 dataset; using partial loading
      
      * cleaned the code
      
      * added version number for c4
      
      * removed irrelevant files
      86a3b270
    • Jess's avatar
      AfroBench: How Good are Large Language Models on African Languages? (#2825) · 18297993
      Jess authored
      
      
      * add afrixnli to task
      
      * add chat completion
      
      * remove chat completion -untested
      
      * afrimmlu added
      
      * afrimmlu folder update
      
      * afrimmlu folder update
      
      * updated prompt
      
      * remove print
      
      * add afrimgsm -direct
      
      * add squad metric
      
      * fix bash script
      
      * remove direct util, update common yaml
      
      * remove print
      
      * add few show. metric fixes
      
      * fix direct path, add bash script for gpt models
      
      * added transate test
      
      * update afrixnli tasks
      
      * update afrixnli tasks
      
      * update metrics for afrixnli
      
      * prompt translations fix
      
      * prompt translations fix
      
      * filter and metric fix -mgsm
      
      * remove squad metric
      
      * remove squad metric
      
      * add f1 score to mgsm
      
      * add f1 score to mgsm
      
      * update native-direct with lin
      
      * change f1 function
      
      * add lin to utils
      
      * add utils
      
      * remove test limit
      
      * remove test configs
      
      * add swahili to mmlu
      
      * change eng to ewe in ewe yaml mmlu
      
      * add squad metric to mgsm, remove whitespace filter
      
      * added translate test
      
      * added afrixnli_translate
      
      * fix exact match valueError
      
      * fix exact match valueError
      
      * restructure mmlu folder
      
      * spacing
      
      * remove afrimmlu_translate folder
      
      * add utility
      
      * format task name, clean ups
      
      * modefied mgsm
      
      * update on afrimgsm
      
      * update on afrimgsm
      
      * removed utils
      
      * other mgsm varieties
      
      * other mgsm varieties
      
      * adding trasnslate direct
      
      * Update translate_direct_yaml
      
      * add manual xnli prompt, add multichoice for openai models, and adapt multichoice metric for openai model
      
      * edit for open models
      
      * Update translate_direct_yaml
      
      * add verbalizer for xnli
      
      * change xnli from multiple choice to generate
      
      * add manual accuracy scores
      
      * revert xnli to multiple choice
      
      * change afrimgsm utils
      
      * revert xnli to multiple_choice
      
      * cleanups and readmes
      
      * remove openai fixes and unused regex
      
      * pr review changes
      
      * revert metrics.py, task.py and extraction.py to main version
      
      * add afrisenti
      
      * utilities
      
      * pulled from main
      
      * add afrixnli
      
      * add afrimmlu
      
      * update afrixnli prompts
      
      * mising senti language
      
      * fix afrisenti prompt 2
      
      * fix afrisenti prompts
      
      * fix afrisenti prompts
      
      * configure task grouping
      
      * add multiple prompts to afrixnli for irokobench
      
      * add multiple prompts to afrimmlu for irokobench
      
      * Update afrixnli_yaml
      
      * fixes and moves
      
      * fixes and moves
      
      * afrimmlu multiple prompts configs
      
      * remove validation set from afrimmlu
      
      * remove eng from afrimmlu translate test
      
      * correct dataset path
      
      * multiple prompts for mgsm
      
      * file restructure
      
      * afribench grouping
      
      * repo restructuring
      
      * repo restructuring
      
      * update exact match to hugging face exact match and add new mgsm language
      
      * remove decontamination
      
      * update generation kwargs
      
      * update generation kwargs for all mgsm prompts
      
      * remove lang
      
      * update generation kwargs for afrimgsm translatetest
      
      * add afrimgsm cot for direct and translate
      
      * remove eng from translate-cot
      
      * add masakhaPOS tasks
      
      * remove changes from task script
      
      * add masakhanews tasks
      
      * add uhura arc easy
      
      * add afriqa and belebele files
      
      * add tags for easier run. add naija rc
      
      * add new metrics and transformation scripts
      
      * fix afriqa swa fewshot split
      
      * add naijarc
      
      * add afrobench lite tasks
      
      * update afrobench
      
      * update afrobench
      
      * remove unverified files to avoid bugs
      
      * remove files not needed
      
      * add afrobench tasks
      
      * add afrobench tasks
      
      * change to version 1
      
      * change to version 1
      
      * update afrobench
      
      * update afrobench
      
      * restore metric to original script
      
      * update readme instructions
      
      * add individual dataset readmes
      
      * add link to collections
      
      * correct run script
      
      * align with main
      
      * align with main
      
      * align with main
      
      * align with main
      
      * align with main
      
      * align with main
      
      * align with main
      
      * align with main
      
      * failed run fixes
      
      * failed run fixes
      
      * add afrimgsm cot
      
      * Apply precommit fixes
      
      * update mafand dataset name
      
      * pull request fixes
      
      * remove afrihate due to availability
      
      ---------
      Co-authored-by: default avatarIsrael Abebe Azime <azime@cg.uni-saarland.de>
      Co-authored-by: default avatarIsrael Abebe Azime <se.israel.abebe@gmail.com>
      Co-authored-by: default avatarDavid Adelani <davlanade@gmail.com>
      Co-authored-by: default avatartheyorubayesian <akin.o.oladipo@gmail.com>
      18297993
  6. 13 May, 2025 2 commits
  7. 06 May, 2025 2 commits
  8. 29 Apr, 2025 1 commit
  9. 16 Apr, 2025 3 commits
  10. 14 Apr, 2025 1 commit
  11. 04 Apr, 2025 2 commits
    • Qubitium-ModelCloud's avatar
      Add GSM8K Platinum (#2771) · 11ac352d
      Qubitium-ModelCloud authored
      * add gsm8k platinum
      
      * only test splits
      
      * wrong dataset
      
      * link to blog
      
      * format
      11ac352d
    • Michele Resta's avatar
      Optimization for evalita-llm rouge computation (#2878) · 22bd2bcb
      Michele Resta authored
      
      
      * feat: initial commit with templates for evalita evaluation
      
      * fix: change rule for generate_until
      
      * feat: modified yaml to use reduced version of NER test datasets
      
      * feat: added templates to use reduced dataset for summarization (fanpage and ilpost)
      
      * Add Six Prompts for Each Multiple-Choice Task
      
      * fix: fastest eval for summarization
      
      * chore: linted with ruff
      
      * chore: linted with ruff
      
      ---------
      Co-authored-by: default avatarrzanoli <zanoli@fbk.eu>
      22bd2bcb
  12. 02 Apr, 2025 2 commits
  13. 01 Apr, 2025 1 commit
  14. 30 Mar, 2025 1 commit
    • Alexandre Marques's avatar
      Adds MMLU CoT, gsm8k and arc_challenge for llama instruct (#2829) · 3816796e
      Alexandre Marques authored
      * llama-style MMLU CoT
      
      * Refactor MMLU CoT template YAML to simplify 'until' structure
      
      * Add GSM8K task configuration for LLaMA3 with few-shot examples
      
      * Fix missing newline at end of MMLU CoT YAML file
      
      * Add ARC-Challenge task configuration and processing utility
      
      * Add additional MMLU and ARC-Challenge task variants to README
      
      * Update README with notes on arc_challenge_llama dataset preprocessing
      3816796e
  15. 29 Mar, 2025 1 commit
  16. 28 Mar, 2025 2 commits
  17. 27 Mar, 2025 3 commits
  18. 26 Mar, 2025 1 commit
  19. 25 Mar, 2025 1 commit
  20. 21 Mar, 2025 2 commits
  21. 20 Mar, 2025 3 commits
    • Alexandre Marques's avatar
      Fixes to mmlu_pro_llama (#2816) · 8028a42f
      Alexandre Marques authored
      * Update generation_kwargs in default template to include additional end tokens
      
      * Update filter_list in MMLU Pro configuration to use strict_match
      
      * Update _default_template_yaml
      8028a42f
    • Baber Abbasi's avatar
      fix typo (#2820) · 110e65da
      Baber Abbasi authored
      110e65da
    • Alexandre Marques's avatar
      Llama3 mmlu correction (#2797) · c73b43f4
      Alexandre Marques authored
      * Update continuation template YAML for MMLU task with new generation and filtering options
      
      * Refactor filter_list structure in continuation template YAML for improved readability
      
      * Add 'take_first' function to filter_list in continuation template YAML
      
      * Update filter_list in continuation template YAML to use 'strict_match' and modify filtering functions
      
      * Add 'do_sample' option to generation_kwargs in MMLU template YAML
      c73b43f4
  22. 18 Mar, 2025 3 commits