1. 03 Jul, 2025 6 commits
  2. 23 Jun, 2025 1 commit
  3. 21 May, 2025 2 commits
  4. 15 May, 2025 1 commit
  5. 22 Apr, 2025 2 commits
  6. 08 Apr, 2025 1 commit
  7. 07 Apr, 2025 1 commit
    • Felipe Maia Polo's avatar
      Add `--samples` Argument for Fine-Grained Task Evaluation in... · d693dcd2
      Felipe Maia Polo authored
      
       Add `--samples` Argument for Fine-Grained Task Evaluation in `lm-evaluation-harness`. This feature is the first step towards efficient multi-prompt evaluation with PromptEval [1,2] (#2520)
      
      * added option --examples
      
      * specifying examples in dictionary
      
      * run pre-commit - fix arg type
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * fixing bug when examples==None
      
      * fixing bug when examples==None
      
      * limit or examples must be None in simple_evaluate.py and in evaluator.py
      
      * run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * merge main and run pre-commit (fix formatting)
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      
      * Update __main__.py
      
      undefined "limit" and "examples"
      
      * update branch, fix conflicts, run pre-commit
      
      * nits
      
      * nits
      
      * change 'examples' to 'samples'
      
      ---------
      
      Signed-off-by: Mírian Silva <mirianfrsilva@ibm.com
      Co-authored-by: default avatarmirianfrsilva <mirianfrsilva@ibm.com>
      Co-authored-by: default avatarStella Biderman <stellabiderman@gmail.com>
      Co-authored-by: default avatarBaber <baber@hey.com>
      d693dcd2
  8. 18 Mar, 2025 2 commits
  9. 26 Feb, 2025 1 commit
  10. 21 Feb, 2025 1 commit
    • Lintang Sutawika's avatar
      Logging (#2203) · 1ba35e62
      Lintang Sutawika authored
      
      
      * changed source of eval_logger
      
      * allow eval_logger to be set from args
      
      * removed verbosity arg from non-main methods
      
      * fix logging
      
      * pre-commit
      
      * set verbosity in eval logger
      
      * replace utils.eval_logger
      
      * fix logging in main
      
      * add logging to docs
      
      * add logging message
      
      * nit
      
      * add logging to docs
      
      * refactor setup_logging to utils
      
      ---------
      Co-authored-by: default avatarBaber <baber@hey.com>
      1ba35e62
  11. 15 Jan, 2025 1 commit
  12. 05 Sep, 2024 1 commit
  13. 20 Aug, 2024 1 commit
  14. 11 Jul, 2024 1 commit
    • anthony-dipofi's avatar
      Prettify lm_eval --tasks list (#1929) · a0243d54
      anthony-dipofi authored
      
      
      * add  and ; move task list newline logic to new TaskManager.list_all_tasks() method
      
      * format table list into markdown table; add config location column
      
      * add Output Type column
      
      * add logic for printing table of tags separately
      
      * merge with main and fix conflicts ; update docstrings
      
      ---------
      Co-authored-by: default avatarhaileyschoelkopf <hailey@eleuther.ai>
      a0243d54
  15. 19 Jun, 2024 1 commit
  16. 09 Jun, 2024 1 commit
  17. 03 Jun, 2024 1 commit
  18. 31 May, 2024 1 commit
    • KonradSzafer's avatar
      Add dataset card when pushing to HF hub (#1898) · f4f59251
      KonradSzafer authored
      
      
      * dataset card initial
      
      * few fixes
      
      * adds groups for math, mmlu, gpqa
      
      * added summary agrs
      
      * moved sanitize_list to utils
      
      * readme update
      
      * recreate metadata moved
      
      * multiple model support
      
      * results latest split fix
      
      * readme update and small refactor
      
      * fix grouping
      
      * add comments
      
      * added pathlib
      
      * corrected pathlib approach
      
      * check whether to create a metadata card
      
      * convert posix paths to str
      
      * default hf org from token
      
      * hf token value error
      
      * Add logs after successful upload
      
      * logging updates
      
      * dataset card example in the readme
      
      ---------
      Co-authored-by: default avatarNathan Habib <nathan.habib@huggingface.com>
      Co-authored-by: default avatarAlina Lozovskaia <alinailozovskaya@gmail.com>
      f4f59251
  19. 26 May, 2024 1 commit
  20. 06 May, 2024 1 commit
    • LSinev's avatar
      Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc
      LSinev authored
      * Added fewshot sampling seeds to evaluator.simple_evaluate signature
      
      Way to control seed of fewshot sampling
      may help with #1591
      
      * Added ability for custom sampler for ConfigurableTask
      
      May be set in config like
      ```
      fewshot_config:
        sampler: !function utils.MyFewshotSampler
      ```
      
      * explicitly set fewshot random generator seed for HFLM generate_until_task test
      
      * add backward compatibility for three args seed setup
      
      * save seeds info to logs/reports
      ae72cebc
  21. 03 May, 2024 2 commits
  22. 07 Apr, 2024 1 commit
  23. 22 Mar, 2024 1 commit
  24. 19 Mar, 2024 1 commit
  25. 18 Mar, 2024 1 commit
    • Hailey Schoelkopf's avatar
      Cleanup for v0.4.2 release (#1573) · 5627e819
      Hailey Schoelkopf authored
      * Update interface.md
      
      * fix: make caching reqs always work with accelerate launch
      
      * remove stale task migration checklist
      
      * remove deprecation warnings
      
      * make informative TypeErrors for get_task_dict
      
      * bump version metadata
      
      * fix num_fewshot printing bug
      
      * add fewshot value to cache key
      5627e819
  26. 17 Mar, 2024 1 commit
  27. 12 Mar, 2024 1 commit
  28. 06 Mar, 2024 1 commit
  29. 04 Mar, 2024 1 commit
  30. 03 Mar, 2024 1 commit
  31. 01 Mar, 2024 1 commit