• Baber Abbasi's avatar
    Refactor `evaluater.evaluate` (#1441) · 5ccd65d4
    Baber Abbasi authored
    
    
    * change `all_gather` to `gather`
    
    * add TaskOutput utility class
    
    * Add FilterResults class and refactor task handling.
    
    * Rename `key` to `filter_key` for clarity
    
    * Add `print_writeout` function in utils.py
    
    * Add function to calculate limit size.
    
    * Add doc_iterator method to Task class
    
    * Refactor `doc_iterator` and cleanup in Task class
    
    * remove superfluous bits
    
    * change `all_gather` to `gather`
    
    * bugfix
    
    * bugfix
    
    * fix `gather`
    
    * Refactor `gather` loop
    
    * Refactor aggregate metrics calculation
    
    * Refactor and simplify aggregate metrics calculation
    Removed unused code
    
    * Simplify metrics calculation and remove unused code.
    
    * simplify the metrics calculation in `utils.py` and `evaluator.py`.
    
    * Fix group metric
    
    * change evaluate to hf_evaluate
    
    * change evaluate to hf_evaluate
    
    * add docs
    
    * add docs
    
    * nits
    
    * make isslice keyword only
    
    * nit
    
    * add todo
    
    * nit
    
    * nit
    
    * nit: swap order samples_metrics tuple
    
    * move instance sorting outside loop
    
    * nit
    
    * nit
    
    * Add __repr__ for ConfigurableTask
    
    * nit
    
    * nit
    
    * Revert "nit"
    
    This reverts commit dab8d9977a643752a17f840fd8cf7e4b107df28f.
    
    * fix some logging
    
    * nit
    
    * fix `predict_only` bug. thanks to `@LSinev`!
    
    * change `print_tasks` to `prepare_print_tasks`
    
    * nits
    
    * move eval utils
    
    * move eval utils
    
    * nit
    
    * add comment
    
    * added tqdm descriptions
    
    * Update lm_eval/evaluator_utils.py
    Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
    
    * fix mgsm bug
    
    * nit
    
    * fix `build_all_requests`
    
    * pre-commit
    
    * add ceil to limit
    
    ---------
    Co-authored-by: default avatarHailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
    5ccd65d4
task.py 55 KB