- 08 Oct, 2024 4 commits
-
-
Hailey Schoelkopf authored
-
Hailey Schoelkopf authored
-
Baber Abbasi authored
* max_images are passed on to vllms `limit_mm_per_prompt` * replace max image placeholders in string * handle chat_template error * move `fewshot_random_seed` to global
-
Baber Abbasi authored
* switch conditional checks to `self.backend` * nit * nit * commit feedback * fix test; update precommit hooks * add escape hatch for custom self.AUTO_MODEL_CLASS * add escape hatch for custom self.AUTO_MODEL_CLASS * fix * move assertion * add logging messages * update AUTO_MODEL_CLASS behavior in _get_backend --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
- 07 Oct, 2024 5 commits
-
-
Baber Abbasi authored
* tokenizer: trust-remote-code * pre-commit --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Chenjie Luo authored
* Fix float limit override See: https://github.com/EleutherAI/lm-evaluation-harness/issues/2324 The float limit will be override with the previous int limit of multiple tasks are triggered together. This PR fix this issue * Update evaluator.py * Update evaluator.py
-
am-bean authored
* Fixing scoring bugs for smaller models * Catching another error type in parsing
-
kyujinHan authored
-
Baber Abbasi authored
* bugfix * pre-commit
-
- 04 Oct, 2024 3 commits
-
-
Baber Abbasi authored
-
zxcvuser authored
* Add catalan_bench * added flores_ca.yaml * Updated some task groupings and readme * Fix create_yamls_flores_ca.py --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
zxcvuser authored
* Add basque_bench * Add flores_eu group * Update _flores_common_yaml * Run linters, updated flores, mgsm, copa, and readme * Apply suggestions from code review Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> --------- Co-authored-by:
Baber Abbasi <92168766+baberabb@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 03 Oct, 2024 2 commits
-
-
zxcvuser authored
* Add galician_bench * Update xnli_gl path * Add flores_gl group * Update _flores_common_yaml * Updated some task groupings and readme ---------
-
zxcvuser authored
* Add spanish_bench * Add flores_es group * Update _flores_common_yaml * Delete lm_eval/tasks/spanish_bench/escola.yaml * Delete escola from spanish_bench.yaml * Delete escola from README.md * pre-commit run --all-files * Updated some task groupings and readme ---------
-
- 30 Sep, 2024 2 commits
-
-
Giulio Lovisotto authored
-
zxcvuser authored
* Add portuguese_bench * Add flores_pt group * Update _flores_common_yaml * Run linters and update flores and readme
-
- 28 Sep, 2024 1 commit
-
-
eyuansu62 authored
* fix some bugs of mmlu * Fix end of file newline issue --------- Co-authored-by:eyuansu62 <772468951@qq.com>
-
- 26 Sep, 2024 9 commits
-
-
Baber Abbasi authored
* better error message; fix greedy matching * Update lm_eval/models/openai_completions.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/openai_completions.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Baber Abbasi authored
-
Arda authored
* Added TurkishMMLU to LM Evaluation Harness * Fixed COT name * Fixed COT name * Updated Readme * Fixed Test issues * Completed Scan for changed tasks * Updated Readme * Update README.md * fixup task naming casing + ensure yaml template stubs aren't registered --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Baber Abbasi authored
* add newlines to task descriptions; increment versions * fix task tests (with groups) * Apply suggestions from code review --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Baber Abbasi authored
* change glianorex to test set * nit * fix test; doc_to_target can be str for multiple_choice * nit
-
Baber Abbasi authored
-
Giulio Lovisotto authored
* Treat python tasks same as yaml tasks. * Add tests. * Re-add fixture decorators. * Fix typing specification error for Python 3.9.
-
Baber Abbasi authored
-
Baber Abbasi authored
-
- 24 Sep, 2024 2 commits
-
-
Eldar Kurtic authored
-
Amine Elhattami authored
-
- 18 Sep, 2024 1 commit
-
-
David Corvoysier authored
* feat(neuron): align with latest optimum-neuron * feat(neuron): support pre-exported neuron models * fix(neuron): correctly use max_length * fix(neuron): adapt loglikelihood The evaluation of log likelihood was not working for neuron models using continuous batching, such as all cached neuron LLama models. * refactor(neuron): remove dead code
-
- 17 Sep, 2024 2 commits
-
-
Baber Abbasi authored
-
SYusupov authored
* Update README.md I encounter some Git buffer size limits when trying to download all commits history of the repository, such as: ```error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 5815 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF``` therefore the installation is faster and there are not errors when I download only the last version of the repository * Fix linting issue
-
- 13 Sep, 2024 1 commit
-
-
Lintang Sutawika authored
* add WIP hf vlm class * add doc_to_image * add mmmu tasks * fix merge conflicts * add lintang's changes to hf_vlms.py * fix doc_to_image * added yaml_path for config-loading * revert * add line to process str type v * update * modeling cleanup * add aggregation for mmmu * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP) * implemented doc_to_image * update doc_to_image to accept list of features * update functions * readd image processed * update args process * bugfix for repeated images fed to model * push WIP loglikelihood code * commit most recent code (generative ; qwen2-vl testing) * preliminary image_token_id handling * small mmmu update: some qs have >4 mcqa options * push updated modeling code * use processor.apply_chat_template * add mathvista draft * nit * nit * ensure no footguns in text<>multimodal LM<>task incompatibility * add notification to readme regarding launch of prototype! * fix compatibility check * reorganize mmmu configs * chat_template=None * add interleave chat_template * add condition * add max_images; interleave=true * nit * testmini_mcq * nit * pass image string; convert img * add vllm * add init * vlm add multi attr * fixup * pass max images to vllm model init * nit * encoding to device * fix HFMultimodalLM.chat_template ? * add mmmu readme * remove erroneous prints * use HFMultimodalLM.chat_template ; restore tasks/__init__.py * add docstring for replace_placeholders in utils * fix `replace_placeholders`; set image_string=None * fix typo * cleanup + fix merge conflicts * update MMMU readme * del mathvista * add some sample scores * Update README.md * add log msg for image_string value --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai> Co-authored-by:
Baber Abbasi <baber@eleuther.ai> Co-authored-by:
Baber <baber@hey.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 10 Sep, 2024 1 commit
-
-
Malikeh Ehghaghi authored
* arabic leaferboard yaml file is added * arabic toxigen is implemented * Dataset library is imported * arabic sciq is added * util file of arabic toxigen is updated * arabic race is added * arabic piqa is implemented * arabic open qa is added * arabic copa is implemented * arabic boolq ia added * arabic arc easy is added * arabic arc challenge is added * arabic exams benchmark is implemented * arabic hellaswag is added * arabic leaderboard yaml file metrics are updated * arabic mmlu benchmarks are added * arabic mmlu group yaml file is updated * alghafa benchmarks are added * acva benchmarks are added * acva utils.py is updated * light version of arabic leaderboard benchmarks are added * bugs fixed * bug fixed * bug fixed * bug fixed * bug fixed * bug fixed * library import bug is fixed * doc to target updated * bash file is deleted * results folder is deleted * leaderboard groups are added * full arabic leaderboard groups are added, plus some bug fixes to the light version * Create README.md README.md for arabic_leaderboard_complete * Create README.md README.md for arabic_leaderboard_light * Delete lm_eval/tasks/arabic_leaderboard directory * Update README.md * Update README.md adding the Arabic leaderboards to the library * Update README.md 10% of the training set * Update README.md 10% of the training set * revert .gitignore to prev version * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated main README.md * Update lm_eval/tasks/README.md * specify machine translated benchmarks (complete) * specify machine translated benchmarks (light version) * add alghafa to the related task names (complete and light) * add 'acva' to the related task names (complete and light) * add 'arabic_leaderboard' to all the groups (complete and light) * all dataset - not a random sample * added more accurate details to the readme file * added mt_mmlu from okapi * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated mt_mmlu readme * renaming 'alghafa' full and light * renaming 'arabic_mmlu' light and full * renaming 'acva' full and light * update readme and standardize dir/file names * running pre-commit --------- Co-authored-by:
shahrzads <sayehban@ualberta.ca> Co-authored-by:
shahrzads <56282669+shahrzads@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 05 Sep, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 04 Sep, 2024 1 commit
-
-
Baber Abbasi authored
* default chat template method fix * move chat_template to TemplateLM * remove hotfix * handle openai `chat_template` * Update lm_eval/api/model.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add 'max_tokens' to gen_kwargs * pre-commit --------- Co-authored-by:
KonradSzafer <szafer.konrad@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 30 Aug, 2024 2 commits
-
-
Baber Abbasi authored
* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods * Revert "fixup! max_length should be handled for logliklihoods" This reverts commit 432d1a3b754c117c3a54ea2fe792ab3a1bd09ed3.
-
Baber Abbasi authored
* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods
-
- 28 Aug, 2024 3 commits
-
-
Hailey Schoelkopf authored
* fix revision type * allow for None-input loglikelihood reqs to be cached * handle no remaining cache items * pre-commit * change cache_hook.add_partial(loglikelihood_rolling...) convention --------- Co-authored-by:Baber Abbasi <baber@eleuther.ai>
-
Hailey Schoelkopf authored
-
Hailey Schoelkopf authored
* Update evaluator.py * update error msg
-