- 20 Feb, 2024 3 commits
-
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-
- 01 Feb, 2024 5 commits
-
-
haileyschoelkopf authored
-
Lintang Sutawika authored
* add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * check if config is task update * add GroupConfig object * edit test yaml * remove args * testing returning to python task list * add weight_by_size config * describe weight_by_size in docs * fix weight by size potential error * can load individual custom python class task * moved import_function into the config loading file * remove print lines * add squadv2 yaml * temporary scroll implementation * revert back to use load_yaml_config but with modes * fix group being loaded with a None * reformat * can load unregistered tasks from a group * update scrolls * edit scrolls multiplechoice task * adjust class initialization * fix initialization * changed how to identify group and python tasks, fix logger * allow loading "include" that is nested in a group config * reworked flan benchmark * allow duplicate task in the same group to co-exist * process group_alias * removed group_alias * allow parameters set in group_config to apply to all tasks in tasklist * add function, but comment for now * reworked processing dict-base config * fixed how configs in group are processed * update to allow root group to have its alias used * remove unused classes * remove unused classes * revert some parts to original * forgot to change one variable * adapt the new process to use get_task_dict * fix for singular group call * fix variable names * add TaskManager into the evaluator * format * changed how dict tasks are loaded * add docs * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update evaluator.py * Update evaluator.py * remove groupconfig for now * changed _config to config * update interface.md to explain TaskManager * added property functions * adjusted logger * update write_out.py * updated tests * added documentation and some modifications * added docstring documentation * precommit format * updated task loading for tests * updates tests * changed arg order for load_yaml_config * update to handle scrolls and edit log message * remove unused lines * return a list of task classes and not a dict * Update __init__.py * Delete lm_eval/tasks/benchmarks/test.yaml * Update task.py * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update utils.py * re-added old functions with new log message * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update new_task_guide.md * added infor regarding `get_task_dict` and documentation * add get_config for Task * pre-commit formatting --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
* allow tasks to specify printed fewshot val * fix to belebele * update metadata field's documentation
-
Baber Abbasi authored
* edge cases where variable might not be assigned. * type hint
-
Hailey Schoelkopf authored
* Update CITATION.bib * Create CONTRIBUTING.md * add disclaimer re: multi node * flesh out some sections more * Flesh out contributor guide * revert CITATION.bib * appease pre-commit --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
- 31 Jan, 2024 5 commits
-
-
Baber Abbasi authored
* add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo
-
Eugene Cheah authored
* Add support for RWKV models with World tokenizer The RWKV line of model with the World tokenizer, does not allow the padding token to be configured, and has its value preset as 0 This however fails all the "if set" checks, and would cause the tokenizer to crash. A tokenizer class name check was added, in addition to a model type check, as there exists RWKV models which uses the neox tokenizers * Update huggingface.py Genericized so that this supports any RWKVWorld tokenizer, and added a fall-back for if the HF implementation name changes. * Comply with formatting guidelines * fix format --------- Co-authored-by:
Stella Biderman <stellabiderman@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
* make deps not point to github urls * formatting * try making PyPI only run on tag pushes
-
Anjor Kanekar authored
* publish to pypi * lint * Update publish.yml * minor
-
Hailey Schoelkopf authored
* don't override do_sample if no value for it is passed * Update gen_kwargs override condition * Update huggingface.py * Update huggingface.py * run linters * silence an erroneous warning
-
- 30 Jan, 2024 1 commit
-
-
Baber Abbasi authored
* delay filter init; remove `*args` * bugfix * optimize * type hint
-
- 29 Jan, 2024 1 commit
-
-
Baber Abbasi authored
-
- 28 Jan, 2024 1 commit
-
-
LSinev authored
* raise Exception, not a string Additional info https://peps.python.org/pep-0352/#exception-hierarchy-changes https://docs.python.org/3.8/tutorial/errors.html#raising-exceptions * Apply PEP8 recommendation to prefer isinstance "Object type comparisons should always use isinstance() instead of comparing types directly" https://peps.python.org/pep-0008/ * Remove dangerous default mutable values in arguments https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/dangerous-default-value.html * Format logging messages with fstring (not with format) Additional info https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/logging-format-interpolation.html There are also discussions about the speed of formatting while logging or some unintended code executions https://github.com/pylint-dev/pylint/issues/2395 https://stackoverflow.com/a/54368109 but at least one format (fstring one) will be used throughout the project * Specify utf-8 encoding for `open` explicitly If not specified, it may be supposed differently in different environments, OSes, and Python versions. See https://peps.python.org/pep-0597/ https://docs.python.org/3.11/library/locale.html#locale.getencoding https://docs.python.org/3.10/library/os.html#utf8-mode https://pylint.readthedocs.io/en/stable/user_guide/messages/warning/unspecified-encoding.html Helps also if some code from English language tasks is taken as inspiration for tasks in non-English languages. * Use inline-ignoring comments to pass pre-commit instead of identity process https://flake8.pycqa.org/en/3.0.1/user/ignoring-errors.html#in-line-ignoring-errors https://www.flake8rules.com/rules/F841.html flake8 comments are supported by ruff: https://docs.astral.sh/ruff/linter/#error-suppression
-
- 26 Jan, 2024 2 commits
-
-
NoushNabi authored
* added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
thnkinbtfly authored
-
- 25 Jan, 2024 2 commits
-
-
Hailey Schoelkopf authored
* Update README.md * [!Tip]
-
Baber Abbasi authored
* get `doc` from instance * acceletate bugfix: get ground doc from instance * convert filter to `process_result` * get docs from instances in `FilterEnsemble` * rename * nit * better looping * fix typehint
-
- 24 Jan, 2024 2 commits
-
-
Hailey Schoelkopf authored
-
Baber Abbasi authored
-
- 23 Jan, 2024 4 commits
-
-
Baber Abbasi authored
* manage default (greedy) gen_kwargs in vllm better * mirror HF `do_sample` * just need to set temp=0 for greedy
-
Hailey Schoelkopf authored
* don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
Hailey Schoelkopf authored
* Update arc_easy.yaml * Update flan_cot.yaml * update HF dataset path * Update freeform.yaml * Update flan_cot.yaml --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
Baber Abbasi authored
-
- 22 Jan, 2024 6 commits
-
-
Brian Vaughan authored
-
haileyschoelkopf authored
-
Brian Vaughan authored
-
Michael Goin authored
* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Lintang Sutawika authored
* add fix fordeciding if stderr is N/A or not * process N/A
-
Hailey Schoelkopf authored
-
- 19 Jan, 2024 1 commit
-
-
Lintang Sutawika authored
-
- 18 Jan, 2024 7 commits
-
-
kwrobel.eth authored
-
Lintang Sutawika authored
* tuple should be considered as well * set option to keep callable as callable
-
Hailey Schoelkopf authored
-
Quentin Lhoest authored
-
Hailey Schoelkopf authored
-
Danielle Pintz authored
-
Hannibal046 authored
* Update nq_open.yaml change regex * Bump NQ version --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-