Cont metrics (#1475)
* add brier_score
* process brier_score
* brier score is working for N-sized class
* fxied brier score
* add TED to BigBench and Brier score to MMLU
* format
* Update metrics.py
* Update task.py
* Update generate_until_template_yaml
* Delete lm_eval/tasks/bigbench/aux_metric.py
* Update generate_until_template_yaml
* Update _default_template_yaml
* Update _generate_configs.py
* Update _generate_configs.py
* Update _generate_configs.py
* fix (format?)
* format?
* format, once more
---------
Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Showing
Please register or sign in to comment