- 21 Feb, 2025 1 commit
-
-
Lintang Sutawika authored
* changed source of eval_logger * allow eval_logger to be set from args * removed verbosity arg from non-main methods * fix logging * pre-commit * set verbosity in eval logger * replace utils.eval_logger * fix logging in main * add logging to docs * add logging message * nit * add logging to docs * refactor setup_logging to utils --------- Co-authored-by:Baber <baber@hey.com>
-
- 07 Feb, 2025 1 commit
-
-
omahs authored
* fix typo * fix typos * fix typos
-
- 20 Dec, 2024 1 commit
-
-
Sabrina J. Mielke authored
-
- 20 Aug, 2024 1 commit
-
-
KonradSzafer authored
* multiple chat template support * help doc update * add transformers link to docstring * model args update * comment update * statement simplification * simplified chat_template property * docs update * removed template arg from HFLM class * interface doc update * model guide update * interface doc update * reuse apply_chat_template variable * model guide refactor * interface doc update * removed old definition * last nits * last nits * last nits * better wording * last nits * Remove unnecessary Optional * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * return variable rename --------- Co-authored-by:
Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 08 Jul, 2024 1 commit
-
-
Nathan Habib authored
* batch commit * :Revert "batch commit" This reverts commit d859d1ca. * batch commit * checkout from main * checkout from main * checkout from main * checkout from main * checkout from main * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup eval results * cleanup * add check for gated repo * fix jsonline issue * fix * add try catch when gating the details repo * add doc * adds back hub_repo_name * readds hub repo name
-
- 25 Jun, 2024 1 commit
-
-
johnwee1 authored
* Update interface.md update interface to remove link to really outdated commit of evaluator.py * switch to relative referencing? * Update interface.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 12 Jun, 2024 1 commit
-
-
Sadra Barikbin authored
-
- 03 Jun, 2024 1 commit
-
-
KonradSzafer authored
* initial chat template * tokenizer attribute check * variable rename * interface update * system instruction * system inst default update * fewshot as multiturn * typing update * indent update * added comments * Adding a fewshot in a more readable way * linting * Moved apply chat template to LM * multiturn alternation fix * cache key update * apply chat template method fix * add system prompt hash to cache_key * tokenizer name property for cache_key * property name fix * linting backward compatibility fix * docs and errors update * add documentation on adding chat template compatibility to model_guide * fewshot as multiturn check fix * saving system inst and chat template in results * eval tracker update * docs update * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai> Co-authored-by:
Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 31 May, 2024 1 commit
-
-
KonradSzafer authored
* dataset card initial * few fixes * adds groups for math, mmlu, gpqa * added summary agrs * moved sanitize_list to utils * readme update * recreate metadata moved * multiple model support * results latest split fix * readme update and small refactor * fix grouping * add comments * added pathlib * corrected pathlib approach * check whether to create a metadata card * convert posix paths to str * default hf org from token * hf token value error * Add logs after successful upload * logging updates * dataset card example in the readme --------- Co-authored-by:
Nathan Habib <nathan.habib@huggingface.com> Co-authored-by:
Alina Lozovskaia <alinailozovskaya@gmail.com>
-
- 24 May, 2024 1 commit
-
-
DongGeon Lee authored
-
- 21 May, 2024 1 commit
-
-
Zafir Stojanovski authored
-
- 13 May, 2024 1 commit
-
-
KonradSzafer authored
-
- 08 May, 2024 1 commit
-
-
aditya thomas authored
* update interface documentation with flag --hf_hub_logs_arg * update interface documentation with flag --hf_hub_logs_arg 2
-
- 06 May, 2024 1 commit
-
-
aditya thomas authored
-
- 18 Mar, 2024 1 commit
-
-
Hailey Schoelkopf authored
* Update interface.md * fix: make caching reqs always work with accelerate launch * remove stale task migration checklist * remove deprecation warnings * make informative TypeErrors for get_task_dict * bump version metadata * fix num_fewshot printing bug * add fewshot value to cache key
-
- 06 Mar, 2024 2 commits
-
-
Sungho Park authored
Update installation commands in openai_completions.py and contributing document and, update wandb_args description (#1536) * Update openai completions and docs/CONTRIBUTING.md * Update wandb args description * Update docs/interface.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
LSinev authored
* Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided) * Fix improper import of LM and usage of evaluator in one of scripts * update type hints in instance and task api * raising errors in task.py instead of asserts * Fix warnings from ruff * raising errors in __main__.py instead of asserts * raising errors in tasks/__init__.py instead of asserts * raising errors in evaluator.py instead of asserts * evaluator: update type hints and remove unused variables in code * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/evaluator.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit induced fixes --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 26 Feb, 2024 1 commit
-
-
Aaron V authored
* Create a means for caching task registration and request building. Add the ability to specify an args dict for simple_evaluate(). * Remove extra S in cache path in caching module Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Rename requests cache args, make model_args polymorphic so that a dict can also be accepted. * Update docs to reflect new caching behavior, add CLI args for requests caching. Create a function for deleting items in the cache. * Update documentation, fix minor bug with arg parsing for requests caching where an undefined variable was used. * Remove line from gitignore, add to cli for caching datasets. * Add hashing suffix to .pickles. Update test script typo. * Favor isinstance() over type() in evaluator.py * Add tests for caching, gets tests working, remove unneeded arg from build_all_requests(). * Update arg description to simple_evaluate. * Update pyproject.toml * Fix typehint * Remove the use of random() for creating default cache pickle hash. * Check that cache dir exists before clearing it in request cache tests. * Fix linting problems. * Fix additional formatting errors. * Remove trailing whitespace. * Add new line to the end of .gitignore. --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 23 Feb, 2024 1 commit
-
-
Vicki Boykis authored
* interface docs * fix link
-
- 12 Feb, 2024 1 commit
-
-
Amine Elhattami authored
* Added seeds to `evaluator.simple_evaluate` signature * Added CLI argument * Updated to add arg.
-
- 01 Feb, 2024 1 commit
-
-
Lintang Sutawika authored
* add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * check if config is task update * add GroupConfig object * edit test yaml * remove args * testing returning to python task list * add weight_by_size config * describe weight_by_size in docs * fix weight by size potential error * can load individual custom python class task * moved import_function into the config loading file * remove print lines * add squadv2 yaml * temporary scroll implementation * revert back to use load_yaml_config but with modes * fix group being loaded with a None * reformat * can load unregistered tasks from a group * update scrolls * edit scrolls multiplechoice task * adjust class initialization * fix initialization * changed how to identify group and python tasks, fix logger * allow loading "include" that is nested in a group config * reworked flan benchmark * allow duplicate task in the same group to co-exist * process group_alias * removed group_alias * allow parameters set in group_config to apply to all tasks in tasklist * add function, but comment for now * reworked processing dict-base config * fixed how configs in group are processed * update to allow root group to have its alias used * remove unused classes * remove unused classes * revert some parts to original * forgot to change one variable * adapt the new process to use get_task_dict * fix for singular group call * fix variable names * add TaskManager into the evaluator * format * changed how dict tasks are loaded * add docs * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update evaluator.py * Update evaluator.py * remove groupconfig for now * changed _config to config * update interface.md to explain TaskManager * added property functions * adjusted logger * update write_out.py * updated tests * added documentation and some modifications * added docstring documentation * precommit format * updated task loading for tests * updates tests * changed arg order for load_yaml_config * update to handle scrolls and edit log message * remove unused lines * return a list of task classes and not a dict * Update __init__.py * Delete lm_eval/tasks/benchmarks/test.yaml * Update task.py * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update utils.py * re-added old functions with new log message * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update new_task_guide.md * added infor regarding `get_task_dict` and documentation * add get_config for Task * pre-commit formatting --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 31 Jan, 2024 1 commit
-
-
Baber Abbasi authored
* add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo
-
- 04 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
-
- 27 Nov, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 17 Nov, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 18 Oct, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 04 Oct, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 11 Sep, 2023 1 commit
-
-
haileyschoelkopf authored
-