- 31 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
* don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning
-
- 30 Jan, 2024 1 commit
-
-
Baber Abbasi authored
* delay filter init; remove `*args` * bugfix * optimize * type hint
-
- 29 Jan, 2024 1 commit
-
-
Baber Abbasi authored
-
- 28 Jan, 2024 1 commit
-
-
LSinev authored
* raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions https://github.com/pylint-dev/pylint/issues/2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression
-
- 26 Jan, 2024 2 commits
-
-
NoushNabi authored
* added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
thnkinbtfly authored
-
- 25 Jan, 2024 1 commit
-
-
Baber Abbasi authored
* get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint
-
- 24 Jan, 2024 1 commit
-
-
Baber Abbasi authored
-
- 23 Jan, 2024 3 commits
-
-
Baber Abbasi authored
* manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy
-
Hailey Schoelkopf authored
* don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
Hailey Schoelkopf authored
* Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
- 22 Jan, 2024 4 commits
-
-
Brian Vaughan authored
-
Michael Goin authored
* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Lintang Sutawika authored
* add fix fordeciding if stderr is N/A or not * process N/A
-
Hailey Schoelkopf authored
-
- 19 Jan, 2024 1 commit
-
-
Lintang Sutawika authored
-
- 18 Jan, 2024 3 commits
-
-
Lintang Sutawika authored
* tuple should be considered as well * set option to keep callable as callable
-
Quentin Lhoest authored
-
Hannibal046 authored
* Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 16 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 15 Jan, 2024 3 commits
-
-
Hailey Schoelkopf authored
* add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter
-
Lintang Sutawika authored
* benchmark yamls allow minor edits of already registered tasks * add documentation * removed print
-
Hailey Schoelkopf authored
-
- 12 Jan, 2024 3 commits
-
-
Hailey Schoelkopf authored
-
jp authored
* Add: kobest config file * Add: kobest utils * Add: README * Update utils.py
-
Hailey Schoelkopf authored
-
- 11 Jan, 2024 2 commits
-
-
Hailey Schoelkopf authored
* fix incorrect lookback protections * bump generate_until task versions
-
Tanishq Abraham authored
* multimedqa * Update medqa.yaml * move to benchmarks folder * add README.md --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-
- 10 Jan, 2024 3 commits
-
-
Baber Abbasi authored
* Refine scoring logic for multiple_target "exact_match" metric * skip old tests from master * skip old tests from master * delete tests from master
-
James A. Michaelov authored
-
Baber Abbasi authored
-
- 08 Jan, 2024 1 commit
-
-
Lintang Sutawika authored
-
- 05 Jan, 2024 2 commits
-
-
Sam Passaglia authored
* do not ensure ascii * Update __main__.py --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-
JorgeDeCorte authored
* add hellaswag_nl * add other languages and update readme to hellaswag * refactor as new task * update readme * add endline to yaml files and readme.md * add group, change folder location and update yaml file * rename default hellaswag yaml file * fix whitespace error in some labels * downgrade log level of whitespace checking --------- Co-authored-by:
JorgeDeCorte <jorge.decorte@ravago.be> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 04 Jan, 2024 2 commits
-
-
Lintang Sutawika authored
* Remove self.dataset_path post_init process * Update task.py * Update task.py
-
Baber Abbasi authored
* copies max_length from huggingface * handle max_length properly * get tokens from inputs * substitute Collator for Reorderer * `batch=auto` if using data_parallel * nit * cleanup * update code comments * `ray.shutdown()` after calling method if data_parallel_size > 1 --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 02 Jan, 2024 3 commits
-
-
Stella Biderman authored
-
Baber Abbasi authored
* auto-batch requires len of iter * handle case when batch_size="auto:N"
-
Pasquale Minervini authored
-
- 29 Dec, 2023 1 commit
-
-
Paul McCann authored
* Add example failing task This task includes an invalid import. This will cause an exception and the task will not be loaded. But this just results in a DEBUG level log message, so in normal usage you'll see no error, and will be told the task doesn't exist. Here's an example command line to run the task: python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail This task is based on a Japanese Winograd task, but that's not important, and was just used due to familiarity. * Do not ignore errors when loading tasks * Change how task errors are logged This makes the proposed changes from PR discussion. 1. Exceptions not related to missing modules/imports are logged as warnings. 2. module/import related exceptions are still logged at debug level, but if any of them happen there is a warning about it with instructions on how to show logs. * Remove intentionally failing task --------- Co-authored-by:Paul O'Leary McCann <polm@dampfkraft.com>
-