- 02 Oct, 2024 1 commit
-
-
Baber authored
-
- 30 Sep, 2024 2 commits
-
-
Giulio Lovisotto authored
-
zxcvuser authored
* Add portuguese_bench * Add flores_pt group * Update _flores_common_yaml * Run linters and update flores and readme
-
- 28 Sep, 2024 1 commit
-
-
eyuansu62 authored
* fix some bugs of mmlu * Fix end of file newline issue --------- Co-authored-by:eyuansu62 <772468951@qq.com>
-
- 26 Sep, 2024 9 commits
-
-
Baber Abbasi authored
* better error message; fix greedy matching * Update lm_eval/models/openai_completions.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/openai_completions.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Baber Abbasi authored
-
Arda authored
* Added TurkishMMLU to LM Evaluation Harness * Fixed COT name * Fixed COT name * Updated Readme * Fixed Test issues * Completed Scan for changed tasks * Updated Readme * Update README.md * fixup task naming casing + ensure yaml template stubs aren't registered --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Baber Abbasi authored
* add newlines to task descriptions; increment versions * fix task tests (with groups) * Apply suggestions from code review --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Baber Abbasi authored
* change glianorex to test set * nit * fix test; doc_to_target can be str for multiple_choice * nit
-
Baber Abbasi authored
-
Giulio Lovisotto authored
* Treat python tasks same as yaml tasks. * Add tests. * Re-add fixture decorators. * Fix typing specification error for Python 3.9.
-
Baber Abbasi authored
-
Baber Abbasi authored
-
- 24 Sep, 2024 2 commits
-
-
Eldar Kurtic authored
-
Amine Elhattami authored
-
- 18 Sep, 2024 1 commit
-
-
David Corvoysier authored
* feat(neuron): align with latest optimum-neuron * feat(neuron): support pre-exported neuron models * fix(neuron): correctly use max_length * fix(neuron): adapt loglikelihood The evaluation of log likelihood was not working for neuron models using continuous batching, such as all cached neuron LLama models. * refactor(neuron): remove dead code
-
- 17 Sep, 2024 2 commits
-
-
Baber Abbasi authored
-
SYusupov authored
* Update README.md I encounter some Git buffer size limits when trying to download all commits history of the repository, such as: ```error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 5815 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF``` therefore the installation is faster and there are not errors when I download only the last version of the repository * Fix linting issue
-
- 13 Sep, 2024 1 commit
-
-
Lintang Sutawika authored
* add WIP hf vlm class * add doc_to_image * add mmmu tasks * fix merge conflicts * add lintang's changes to hf_vlms.py * fix doc_to_image * added yaml_path for config-loading * revert * add line to process str type v * update * modeling cleanup * add aggregation for mmmu * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP) * implemented doc_to_image * update doc_to_image to accept list of features * update functions * readd image processed * update args process * bugfix for repeated images fed to model * push WIP loglikelihood code * commit most recent code (generative ; qwen2-vl testing) * preliminary image_token_id handling * small mmmu update: some qs have >4 mcqa options * push updated modeling code * use processor.apply_chat_template * add mathvista draft * nit * nit * ensure no footguns in text<>multimodal LM<>task incompatibility * add notification to readme regarding launch of prototype! * fix compatibility check * reorganize mmmu configs * chat_template=None * add interleave chat_template * add condition * add max_images; interleave=true * nit * testmini_mcq * nit * pass image string; convert img * add vllm * add init * vlm add multi attr * fixup * pass max images to vllm model init * nit * encoding to device * fix HFMultimodalLM.chat_template ? * add mmmu readme * remove erroneous prints * use HFMultimodalLM.chat_template ; restore tasks/__init__.py * add docstring for replace_placeholders in utils * fix `replace_placeholders`; set image_string=None * fix typo * cleanup + fix merge conflicts * update MMMU readme * del mathvista * add some sample scores * Update README.md * add log msg for image_string value --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai> Co-authored-by:
Baber Abbasi <baber@eleuther.ai> Co-authored-by:
Baber <baber@hey.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 10 Sep, 2024 1 commit
-
-
Malikeh Ehghaghi authored
* arabic leaferboard yaml file is added * arabic toxigen is implemented * Dataset library is imported * arabic sciq is added * util file of arabic toxigen is updated * arabic race is added * arabic piqa is implemented * arabic open qa is added * arabic copa is implemented * arabic boolq ia added * arabic arc easy is added * arabic arc challenge is added * arabic exams benchmark is implemented * arabic hellaswag is added * arabic leaderboard yaml file metrics are updated * arabic mmlu benchmarks are added * arabic mmlu group yaml file is updated * alghafa benchmarks are added * acva benchmarks are added * acva utils.py is updated * light version of arabic leaderboard benchmarks are added * bugs fixed * bug fixed * bug fixed * bug fixed * bug fixed * bug fixed * library import bug is fixed * doc to target updated * bash file is deleted * results folder is deleted * leaderboard groups are added * full arabic leaderboard groups are added, plus some bug fixes to the light version * Create README.md README.md for arabic_leaderboard_complete * Create README.md README.md for arabic_leaderboard_light * Delete lm_eval/tasks/arabic_leaderboard directory * Update README.md * Update README.md adding the Arabic leaderboards to the library * Update README.md 10% of the training set * Update README.md 10% of the training set * revert .gitignore to prev version * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated main README.md * Update lm_eval/tasks/README.md * specify machine translated benchmarks (complete) * specify machine translated benchmarks (light version) * add alghafa to the related task names (complete and light) * add 'acva' to the related task names (complete and light) * add 'arabic_leaderboard' to all the groups (complete and light) * all dataset - not a random sample * added more accurate details to the readme file * added mt_mmlu from okapi * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated mt_mmlu readme * renaming 'alghafa' full and light * renaming 'arabic_mmlu' light and full * renaming 'acva' full and light * update readme and standardize dir/file names * running pre-commit --------- Co-authored-by:
shahrzads <sayehban@ualberta.ca> Co-authored-by:
shahrzads <56282669+shahrzads@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 05 Sep, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 04 Sep, 2024 1 commit
-
-
Baber Abbasi authored
* default chat template method fix * move chat_template to TemplateLM * remove hotfix * handle openai `chat_template` * Update lm_eval/api/model.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add 'max_tokens' to gen_kwargs * pre-commit --------- Co-authored-by:
KonradSzafer <szafer.konrad@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 30 Aug, 2024 2 commits
-
-
Baber Abbasi authored
* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods * Revert "fixup! max_length should be handled for logliklihoods" This reverts commit 432d1a3b754c117c3a54ea2fe792ab3a1bd09ed3.
-
Baber Abbasi authored
* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods
-
- 28 Aug, 2024 3 commits
-
-
Hailey Schoelkopf authored
* fix revision type * allow for None-input loglikelihood reqs to be cached * handle no remaining cache items * pre-commit * change cache_hook.add_partial(loglikelihood_rolling...) convention --------- Co-authored-by:Baber Abbasi <baber@eleuther.ai>
-
Hailey Schoelkopf authored
-
Hailey Schoelkopf authored
* Update evaluator.py * update error msg
-
- 25 Aug, 2024 1 commit
-
-
Baber Abbasi authored
* chat template hotfix * pre-commit
-
- 23 Aug, 2024 3 commits
-
-
Cameron Witkowski authored
* Created DUP eval code for gsm8k * asdiv * Fixed fewshot=8 issue * added results to .gitignore * reverted unnecessary changes and moved results + gsm8k_dup out of repo to prepare for pull req * fixed whitespace and unintentional hardcoded version change information * created mbpp task * Reverted changes re. mbpp to save for a future Pull req * reverted metrics.py to previous commit * updated asdiv readme to include informaiton about new asdiv_cot_llama task * Apply suggestions from code review --------- Co-authored-by:
Alexander Detkov <alexander.d.detkov@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
eyuansu62 authored
-
LSinev authored
ACLUE bibtex typo reported to ACL Anthology and fixed here as title in pdf is correct.
-
- 22 Aug, 2024 3 commits
-
-
Baber Abbasi authored
-
Wessel Poelman authored
-
lxning authored
* fix the regex string in yaml file * Update samplers.py --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
- 20 Aug, 2024 6 commits
-
-
Geralt authored
* mela * Update mela_en.yaml * Create _mela.yaml --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
Nam D. Tran authored
* fix: arguments data * fix based on comment * Update zeno_visualize.py updated all output types --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
Hailey Schoelkopf authored
-
KonradSzafer authored
* multiple chat template support * help doc update * add transformers link to docstring * model args update * comment update * statement simplification * simplified chat_template property * docs update * removed template arg from HFLM class * interface doc update * model guide update * interface doc update * reuse apply_chat_template variable * model guide refactor * interface doc update * removed old definition * last nits * last nits * last nits * better wording * last nits * Remove unnecessary Optional * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * return variable rename --------- Co-authored-by:
Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Nathan Habib authored
-
lewtun authored
* Update IFEval dataset to official one This PR updates the IFEval dataset to the one hosted under the Google org: https://huggingface.co/datasets/google/IFEval Note the main change is an updated prompt from this commit in the GitHub repo: https://github.com/google-research/google-research/commit/26d8ccdab6fec61b5c83ad6327ea8bda9e580288 * Update ifeval.yaml --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-