1. 10 Jun, 2024 5 commits
  2. 07 Jun, 2024 1 commit
  3. 06 Jun, 2024 1 commit
  4. 03 Jun, 2024 1 commit
  5. 31 May, 2024 1 commit
  6. 24 May, 2024 2 commits
  7. 16 May, 2024 1 commit
  8. 15 May, 2024 1 commit
  9. 11 May, 2024 3 commits
  10. 10 May, 2024 3 commits
  11. 08 May, 2024 2 commits
  12. 07 May, 2024 2 commits
  13. 06 May, 2024 1 commit
    • LSinev's avatar
      Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc
      LSinev authored
      * Added fewshot sampling seeds to evaluator.simple_evaluate signature
      
      Way to control seed of fewshot sampling
      may help with #1591
      
      * Added ability for custom sampler for ConfigurableTask
      
      May be set in config like
      ```
      fewshot_config:
        sampler: !function utils.MyFewshotSampler
      ```
      
      * explicitly set fewshot random generator seed for HFLM generate_until_task test
      
      * add backward compatibility for three args seed setup
      
      * save seeds info to logs/reports
      ae72cebc
  14. 26 Apr, 2024 1 commit
  15. 25 Apr, 2024 1 commit
  16. 24 Apr, 2024 1 commit
  17. 23 Apr, 2024 1 commit
  18. 25 Mar, 2024 1 commit
    • Lintang Sutawika's avatar
      Seq2seq fix (#1604) · 262f879a
      Lintang Sutawika authored
      
      
      * fix on --task list
      
      * add fixes to tokeniation
      
      * differentiate encoding for seq2seq and decoder
      
      * return token setting
      
      * format for pre-commit
      
      * Seq2seq fix, pt2 (#1630)
      
      * getting model class only when defined
      
      * encode_pair handles None, add_special_tokens turned into dict with default value
      
      ---------
      Co-authored-by: default avatarachervyakov <77295913+artemorloff@users.noreply.github.com>
      262f879a
  19. 20 Mar, 2024 1 commit
  20. 19 Mar, 2024 1 commit
  21. 18 Mar, 2024 2 commits
  22. 17 Mar, 2024 1 commit
  23. 13 Mar, 2024 1 commit
  24. 10 Mar, 2024 1 commit
  25. 06 Mar, 2024 2 commits
  26. 27 Feb, 2024 1 commit
    • Baber Abbasi's avatar
      Refactor `evaluater.evaluate` (#1441) · 5ccd65d4
      Baber Abbasi authored
      
      
      * change `all_gather` to `gather`
      
      * add TaskOutput utility class
      
      * Add FilterResults class and refactor task handling.
      
      * Rename `key` to `filter_key` for clarity
      
      * Add `print_writeout` function in utils.py
      
      * Add function to calculate limit size.
      
      * Add doc_iterator method to Task class
      
      * Refactor `doc_iterator` and cleanup in Task class
      
      * remove superfluous bits
      
      * change `all_gather` to `gather`
      
      * bugfix
      
      * bugfix
      
      * fix `gather`
      
      * Refactor `gather` loop
      
      * Refactor aggregate metrics calculation
      
      * Refactor and simplify aggregate metrics calculation
      Removed unused code
      
      * Simplify metrics calculation and remove unused code.
      
      * simplify the metrics calculation in `utils.py` and `evaluator.py`.
      
      * Fix group metric
      
      * change evaluate to hf_evaluate
      
      * change evaluate to hf_evaluate
      
      * add docs
      
      * add docs
      
      * nits
      
      * make isslice keyword only
      
      * nit
      
      * add todo
      
      * nit
      
      * nit
      
      * nit: swap order samples_metrics tuple
      
      * move instance sorting outside loop
      
      * nit
      
      * nit
      
      * Add __repr__ for ConfigurableTask
      
      * nit
      
      * nit
      
      * Revert "nit"
      
      This reverts commit dab8d9977a643752a17f840fd8cf7e4b107df28f.
      
      * fix some logging
      
      * nit
      
      * fix `predict_only` bug. thanks to `@LSinev`!
      
      * change `print_tasks` to `prepare_print_tasks`
      
      * nits
      
      * move eval utils
      
      * move eval utils
      
      * nit
      
      * add comment
      
      * added tqdm descriptions
      
      * Update lm_eval/evaluator_utils.py
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      
      * fix mgsm bug
      
      * nit
      
      * fix `build_all_requests`
      
      * pre-commit
      
      * add ceil to limit
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      5ccd65d4
  27. 26 Feb, 2024 1 commit
    • Lintang Sutawika's avatar
      Cont metrics (#1475) · 96d185fa
      Lintang Sutawika authored
      
      
      * add brier_score
      
      * process brier_score
      
      * brier score is working for N-sized class
      
      * fxied brier score
      
      * add TED to BigBench and Brier score to MMLU
      
      * format
      
      * Update metrics.py
      
      * Update task.py
      
      * Update generate_until_template_yaml
      
      * Delete lm_eval/tasks/bigbench/aux_metric.py
      
      * Update generate_until_template_yaml
      
      * Update _default_template_yaml
      
      * Update _generate_configs.py
      
      * Update _generate_configs.py
      
      * Update _generate_configs.py
      
      * fix (format?)
      
      * format?
      
      * format, once more
      
      ---------
      Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
      96d185fa