- 23 Jan, 2024 17 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
lintangsutawika authored
-
- 19 Jan, 2024 1 commit
-
-
Lintang Sutawika authored
-
- 18 Jan, 2024 3 commits
-
-
Lintang Sutawika authored
* tuple should be considered as well * set option to keep callable as callable
-
Quentin Lhoest authored
-
Hannibal046 authored
* Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 16 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 15 Jan, 2024 2 commits
-
-
Lintang Sutawika authored
* benchmark yamls allow minor edits of already registered tasks * add documentation * removed print
-
Hailey Schoelkopf authored
-
- 12 Jan, 2024 1 commit
-
-
jp authored
* Add: kobest config file * Add: kobest utils * Add: README * Update utils.py
-
- 11 Jan, 2024 2 commits
-
-
Hailey Schoelkopf authored
* fix incorrect lookback protections * bump generate_until task versions
-
Tanishq Abraham authored
* multimedqa * Update medqa.yaml * move to benchmarks folder * add README.md --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-
- 10 Jan, 2024 1 commit
-
-
James A. Michaelov authored
-
- 05 Jan, 2024 1 commit
-
-
JorgeDeCorte authored
* add hellaswag_nl * add other languages and update readme to hellaswag * refactor as new task * update readme * add endline to yaml files and readme.md * add group, change folder location and update yaml file * rename default hellaswag yaml file * fix whitespace error in some labels * downgrade log level of whitespace checking --------- Co-authored-by:
JorgeDeCorte <jorge.decorte@ravago.be> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 02 Jan, 2024 1 commit
-
-
Pasquale Minervini authored
-
- 29 Dec, 2023 1 commit
-
-
Paul McCann authored
* Add example failing task This task includes an invalid import. This will cause an exception and the task will not be loaded. But this just results in a DEBUG level log message, so in normal usage you'll see no error, and will be told the task doesn't exist. Here's an example command line to run the task: python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail This task is based on a Japanese Winograd task, but that's not important, and was just used due to familiarity. * Do not ignore errors when loading tasks * Change how task errors are logged This makes the proposed changes from PR discussion. 1. Exceptions not related to missing modules/imports are logged as warnings. 2. module/import related exceptions are still logged at debug level, but if any of them happen there is a warning about it with instructions on how to show logs. * Remove intentionally failing task --------- Co-authored-by:Paul O'Leary McCann <polm@dampfkraft.com>
-
- 27 Dec, 2023 1 commit
-
-
Baber Abbasi authored
* fix group * siqa: default.yml -> default.yaml * max_gen_toks -> self.max_gen_toks * add ids to task tests * fix siqa * fix gen_kwargs for openai-chat
-
- 24 Dec, 2023 1 commit
-
-
MorishT authored
* Add remove_whitespace to FLD benchmark * bump task version --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 21 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
* change version field formatting in metadata * mention versioning in new task guide * add instructions for changelog * run linters
-
- 20 Dec, 2023 2 commits
-
-
GUIJIN SON authored
* update kmmlu default formatting * Update _default_kmmlu_yaml * Delete lm_eval/tasks/kmmlu/utils.py
-
Baber Abbasi authored
* add ruff and isort. remove black and flake8 * remove unnecessary dependencies * remove dependency from table * change order * ran ruff * check 3.9 * exclude evaluator * update CI workflow * use ruff config in pyproject.toml * test * add isort rules to ruff * sort imports * import `make_table` * try stages for no-commit-to-branch * turn on mypy for pre-commit * test * test * test * change no-commit-to-branch to default * nits * fixed dependency
-
- 19 Dec, 2023 1 commit
-
-
seungduk.kim.2304 authored
* Correct column names and dataset names * Remove kmmlu_general_physics.yaml and kmmlu_korean_language.yaml * Update _default_kmmlu_yaml --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 18 Dec, 2023 1 commit
-
-
Baber Abbasi authored
-
- 17 Dec, 2023 1 commit
-
-
Wis Kojohnjaratkul authored
* Add IFEval task * Check and download nltk punkt if not already downloaded * Update gen_max_toks to 2048 to support "900 words+" instructions * Resolve pre-commit linting issues * Reduce max_gen_toks to 1280 to conserve token usage * Add warning message in `process_results` call for non chat-finetuned models
-
- 15 Dec, 2023 1 commit
-
-
MorishT authored
* [fix] loading dataset from hub fails when the dataset name includes '.', as the program assumes it is on the local filesystem * add FLD benchmark * Update task.py * [update] add group 'fld' * [update] rename fld -> fld_default. add explanation to the readme * Update README.md --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-