1. 28 Mar, 2025 1 commit
  2. 27 Mar, 2025 3 commits
  3. 26 Mar, 2025 1 commit
  4. 25 Mar, 2025 1 commit
  5. 21 Mar, 2025 2 commits
  6. 20 Mar, 2025 3 commits
    • Alexandre Marques's avatar
      Fixes to mmlu_pro_llama (#2816) · 8028a42f
      Alexandre Marques authored
      * Update generation_kwargs in default template to include additional end tokens
      
      * Update filter_list in MMLU Pro configuration to use strict_match
      
      * Update _default_template_yaml
      8028a42f
    • Baber Abbasi's avatar
      fix typo (#2820) · 110e65da
      Baber Abbasi authored
      110e65da
    • Alexandre Marques's avatar
      Llama3 mmlu correction (#2797) · c73b43f4
      Alexandre Marques authored
      * Update continuation template YAML for MMLU task with new generation and filtering options
      
      * Refactor filter_list structure in continuation template YAML for improved readability
      
      * Add 'take_first' function to filter_list in continuation template YAML
      
      * Update filter_list in continuation template YAML to use 'strict_match' and modify filtering functions
      
      * Add 'do_sample' option to generation_kwargs in MMLU template YAML
      c73b43f4
  7. 18 Mar, 2025 5 commits
  8. 17 Mar, 2025 2 commits
  9. 14 Mar, 2025 3 commits
    • Oskar van der Wal's avatar
      Add various social bias tasks (#1185) · 150a1852
      Oskar van der Wal authored
      
      
      * Implementation of Winogender
      
      * Minor fixes README.md
      
      * Add winogender
      
      * Clean winogender utils.py
      
      * Change dataset to one containing All subsets
      
      * Flesh out README for BBQ task
      
      * Add missing tasks for BBQ
      
      * Add simple cooccurrence bias task
      
      * Fix wrong mask for ambiguated context+rename metrics
      
      * Made generate_until evaluation (following PALM paper) default
      
      Also moved separate config files per category to separate metrics using custom function.
      Created config file for multiple_choice way of evaluating BBQ.
      
      * Add missing version metadata
      
      * Add missing versionmetadata for bbq multiple choice
      
      * Fix metrics and address edge cases
      
      * Made BBQ multiple choice the default version
      
      * Added settings following winogrande
      
      * Add num_fewshot to simple_cooccurrence_bias
      
      * Fixes for bbq (multiple choice)
      
      * Fix wrong dataset
      
      * CrowS-Pairs: make it easier to use another dataset by removing dataset_name from the subsets.
      
      * Use simplest prompt possible without description
      
      * Merge
      
      * BBQ: Fix np.NaN related bug
      
      * BBQ: Fix wrong aggregation method for disamb accuracy
      
      * BBQ: Make it possible to only evaluate on (dis)ambiguous subset (needed for few shot eval)
      
      * BBQ: fix showing one target in case of few-shot evals
      
      * BBQ: Fix few-shot example for bbq_generate
      
      * BBQ: simplify subtasks
      
      * BBQ: Minimize number of UNK variations to reduce inference time
      
      * BBQ: Add extra UNK keywords for the generate task
      
      * Add a generate_until version of simple_cooccurrence_bias
      
      * Change system/description prompt to include few-shot examples
      
      * Group agg rework
      
      * Run pre-commit
      
      * add tasks to readme table
      
      * remove trailing space from simple_cooccurrence_bias_gen.yaml `doc_to_text`
      
      * fix
      
      * fix
      
      * fix version
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      150a1852
    • achervyakov's avatar
      add audio modality (qwen2 audio only) (#2689) · 62552d2c
      achervyakov authored
      
      
      * Added audio-modality pipeline for qwen2-audio model
      
      * Beauty imports
      
      * fix apply_chat_template args
      
      * update default audio placeholders list
      
      * add demo task - common_voice subset
      
      * add audiolm_qwen libs to pyproject.toml
      
      * pre-commit beautify
      
      ---------
      Co-authored-by: default avatarAlexandra Rak <rakalexandra@mail.ru>
      62552d2c
    • Baber Abbasi's avatar
  10. 11 Mar, 2025 5 commits
  11. 05 Mar, 2025 1 commit
  12. 04 Mar, 2025 1 commit
  13. 03 Mar, 2025 1 commit
  14. 27 Feb, 2025 1 commit
  15. 25 Feb, 2025 3 commits
  16. 24 Feb, 2025 2 commits
  17. 23 Feb, 2025 1 commit
  18. 21 Feb, 2025 3 commits
    • Farhan Ahmed's avatar
      fix missing dataset repo (#2719) · 0bf9f4ea
      Farhan Ahmed authored
      0bf9f4ea
    • Lintang Sutawika's avatar
      Logging (#2203) · 1ba35e62
      Lintang Sutawika authored
      
      
      * changed source of eval_logger
      
      * allow eval_logger to be set from args
      
      * removed verbosity arg from non-main methods
      
      * fix logging
      
      * pre-commit
      
      * set verbosity in eval logger
      
      * replace utils.eval_logger
      
      * fix logging in main
      
      * add logging to docs
      
      * add logging message
      
      * nit
      
      * add logging to docs
      
      * refactor setup_logging to utils
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      1ba35e62
    • Baber Abbasi's avatar
      add math_verify to some tasks (#2686) · 358adaf7
      Baber Abbasi authored
      * add math_verify to minerva math
      
      * add math_verify to benchmark
      
      * fix error
      
      * increment version
      358adaf7
  19. 14 Feb, 2025 1 commit