- 13 Sep, 2024 1 commit
-
-
Lintang Sutawika authored
* add WIP hf vlm class * add doc_to_image * add mmmu tasks * fix merge conflicts * add lintang's changes to hf_vlms.py * fix doc_to_image * added yaml_path for config-loading * revert * add line to process str type v * update * modeling cleanup * add aggregation for mmmu * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP) * implemented doc_to_image * update doc_to_image to accept list of features * update functions * readd image processed * update args process * bugfix for repeated images fed to model * push WIP loglikelihood code * commit most recent code (generative ; qwen2-vl testing) * preliminary image_token_id handling * small mmmu update: some qs have >4 mcqa options * push updated modeling code * use processor.apply_chat_template * add mathvista draft * nit * nit * ensure no footguns in text<>multimodal LM<>task incompatibility * add notification to readme regarding launch of prototype! * fix compatibility check * reorganize mmmu configs * chat_template=None * add interleave chat_template * add condition * add max_images; interleave=true * nit * testmini_mcq * nit * pass image string; convert img * add vllm * add init * vlm add multi attr * fixup * pass max images to vllm model init * nit * encoding to device * fix HFMultimodalLM.chat_template ? * add mmmu readme * remove erroneous prints * use HFMultimodalLM.chat_template ; restore tasks/__init__.py * add docstring for replace_placeholders in utils * fix `replace_placeholders`; set image_string=None * fix typo * cleanup + fix merge conflicts * update MMMU readme * del mathvista * add some sample scores * Update README.md * add log msg for image_string value --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai> Co-authored-by:
Baber Abbasi <baber@eleuther.ai> Co-authored-by:
Baber <baber@hey.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 28 Jun, 2024 1 commit
-
-
Baber Abbasi authored
* add chat template * refactor token padding * nit * nit * check on failing test * check transformers version * remove transformers pin * add ids to test * nit * fixup * fix bos bug * nit * fixup! fix bos bug * increase tolerance for table test * don't detokenize vllm logprobs * Update lm_eval/models/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit run --all-files --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 01 Mar, 2024 1 commit
-
-
Hailey Schoelkopf authored
* add undistribute + use more_itertools * remove divide() util fn * add more_itertools as dependency
-
- 20 Feb, 2024 1 commit
-
-
Baber Abbasi authored
* add key lookup for same contexts * nit * appease pre-commit * nit * use `expand` (in-place view) rather than `repeat` * try mixed grouping * add docs. * nit * nit * nits * fix tests * Move greedy_tokens calculation out of cache loop * nit * nits * add test * nits * fix name conflict * fix name conflict * chunk tensor * move Collator * nits/docstring * fixup * fixup * group contexts only for decoders * pre-commit * fix `generate_until` test * fix `generate_until` test * Update lm_eval/models/huggingface.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add docs * nit * add docs * add docs * add 'logits_cache' arg * bugfix --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 14 Feb, 2024 1 commit
-
-
Baber Abbasi authored
-
- 01 Feb, 2024 1 commit
-
-
Lintang Sutawika authored
* add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * add trust_remote_code as default * task for testing recursive * changed source of ALL_TASKS * tasks should only accept TaskObjects * initialize_tasks returns list of tasks and groups * remove trust_remote_code for now * moved constructor process to inside load_yaml_config * more comprehensive way to index tasks and groups * pre-commit format * add exit after error * adjust how task objects are called * no need to use get_task_dict * load_task_or_group works but only for tasks * pre-commit format * half working for nested groups * changed variable names * allow groups and tasks to work * temp save * indexing and loading are part of a task_manager object * adapted initialize_tasks * iron out bugs * fixed typo * fixed typo * simplified code * further tidy up * remove lines for testing * removed test lines * removed unused code * remove unused import * fixed bug * removed comments * group in a list of group can accept parameter changes like `num_fewshot` * check if config is task update * add GroupConfig object * edit test yaml * remove args * testing returning to python task list * add weight_by_size config * describe weight_by_size in docs * fix weight by size potential error * can load individual custom python class task * moved import_function into the config loading file * remove print lines * add squadv2 yaml * temporary scroll implementation * revert back to use load_yaml_config but with modes * fix group being loaded with a None * reformat * can load unregistered tasks from a group * update scrolls * edit scrolls multiplechoice task * adjust class initialization * fix initialization * changed how to identify group and python tasks, fix logger * allow loading "include" that is nested in a group config * reworked flan benchmark * allow duplicate task in the same group to co-exist * process group_alias * removed group_alias * allow parameters set in group_config to apply to all tasks in tasklist * add function, but comment for now * reworked processing dict-base config * fixed how configs in group are processed * update to allow root group to have its alias used * remove unused classes * remove unused classes * revert some parts to original * forgot to change one variable * adapt the new process to use get_task_dict * fix for singular group call * fix variable names * add TaskManager into the evaluator * format * changed how dict tasks are loaded * add docs * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update evaluator.py * Update evaluator.py * remove groupconfig for now * changed _config to config * update interface.md to explain TaskManager * added property functions * adjusted logger * update write_out.py * updated tests * added documentation and some modifications * added docstring documentation * precommit format * updated task loading for tests * updates tests * changed arg order for load_yaml_config * update to handle scrolls and edit log message * remove unused lines * return a list of task classes and not a dict * Update __init__.py * Delete lm_eval/tasks/benchmarks/test.yaml * Update task.py * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update utils.py * re-added old functions with new log message * Update docs/new_task_guide.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update new_task_guide.md * added infor regarding `get_task_dict` and documentation * add get_config for Task * pre-commit formatting --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 11 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
* fix incorrect lookback protections * bump generate_until task versions
-
- 02 Jan, 2024 1 commit
-
-
Baber Abbasi authored
* auto-batch requires len of iter * handle case when batch_size="auto:N"
-
- 27 Dec, 2023 1 commit
-
-
Baber Abbasi authored
* fix group * siqa: default.yml -> default.yaml * max_gen_toks -> self.max_gen_toks * add ids to task tests * fix siqa * fix gen_kwargs for openai-chat
-
- 24 Dec, 2023 1 commit
-
-
Yuliang Li authored
-
- 23 Dec, 2023 1 commit
-
-
Baber Abbasi authored
* refactor dataloader * cleanup + add docs * change arg * renamed Collator and added testing * parametrized test for Collator * appease pre-commit * added edge case batch 0 (no batching) * fix typos --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 22 Dec, 2023 1 commit
-
-
Zach Schillaci authored
* Add retry error handler * fixup! Add retry error handler * Move to utils.py * Run isort on utils.py * Catch multiple exceptions * Update LMs with exception handler * Fixes to anthropic retry handler * fix callback kwarg * Update textsynth.py * fix python 3.8 incompatibility * fix indenterror I introduced * placate linter? * Update on_exception_callback kwarg name * fixup! Merge branch 'main' into add-retry-error-handler * fixup! fixup! Merge branch 'main' into add-retry-error-handler * Merge conflicts are fun * Run pre-commit --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 20 Dec, 2023 1 commit
-
-
Baber Abbasi authored
* add ruff and isort. remove black and flake8 * remove unnecessary dependencies * remove dependency from table * change order * ran ruff * check 3.9 * exclude evaluator * update CI workflow * use ruff config in pyproject.toml * test * add isort rules to ruff * sort imports * import `make_table` * try stages for no-commit-to-branch * turn on mypy for pre-commit * test * test * test * change no-commit-to-branch to default * nits * fixed dependency
-
- 12 Dec, 2023 1 commit
-
-
lintangsutawika authored
-
- 29 Nov, 2023 1 commit
-
-
baberabb authored
-
- 28 Nov, 2023 1 commit
-
-
lintangsutawika authored
-
- 21 Nov, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 20 Nov, 2023 1 commit
-
-
baberabb authored
-
- 17 Nov, 2023 1 commit
-
-
lintangsutawika authored
-
- 14 Nov, 2023 1 commit
-
-
lintangsutawika authored
enables tasks of different group but same task_alis (for example, if evaluating on different versions of MMLU
-
- 13 Nov, 2023 1 commit
-
-
lintangsutawika authored
-
- 10 Nov, 2023 1 commit
-
-
lintangsutawika authored
-
- 02 Nov, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 11 Oct, 2023 1 commit
-
-
Zhiwei Zhuang authored
-
- 03 Oct, 2023 1 commit
-
-
USVSN Sai Prashanth authored
-
- 26 Sep, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 22 Sep, 2023 2 commits
-
-
haileyschoelkopf authored
-
haileyschoelkopf authored
-
- 15 Sep, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 14 Sep, 2023 1 commit
-
-
haileyschoelkopf authored
-
- 12 Sep, 2023 2 commits
-
-
lintangsutawika authored
-
lintangsutawika authored
-
- 11 Sep, 2023 1 commit
-
-
lintangsutawika authored
-
- 07 Sep, 2023 1 commit
-
-
lintangsutawika authored
-
- 29 Aug, 2023 1 commit
-
-
lintangsutawika authored
-
- 25 Aug, 2023 1 commit
-
-
Ethan Smith authored
This adds a bunch of simple annotations suggested by https://github.com/JelleZijlstra/autotyping.
-
- 08 Aug, 2023 1 commit
-
-
baberabb authored
-
- 04 Aug, 2023 1 commit
-
-
lintangsutawika authored
-