- 20 Aug, 2024 6 commits
-
-
Geralt authored
* mela * Update mela_en.yaml * Create _mela.yaml --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
Nam D. Tran authored
* fix: arguments data * fix based on comment * Update zeno_visualize.py updated all output types --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
Hailey Schoelkopf authored
-
KonradSzafer authored
* multiple chat template support * help doc update * add transformers link to docstring * model args update * comment update * statement simplification * simplified chat_template property * docs update * removed template arg from HFLM class * interface doc update * model guide update * interface doc update * reuse apply_chat_template variable * model guide refactor * interface doc update * removed old definition * last nits * last nits * last nits * better wording * last nits * Remove unnecessary Optional * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * return variable rename --------- Co-authored-by:
Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Nathan Habib authored
-
lewtun authored
* Update IFEval dataset to official one This PR updates the IFEval dataset to the one hosted under the Google org: https://huggingface.co/datasets/google/IFEval Note the main change is an updated prompt from this commit in the GitHub repo: https://github.com/google-research/google-research/commit/26d8ccdab6fec61b5c83ad6327ea8bda9e580288 * Update ifeval.yaml --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 19 Aug, 2024 3 commits
-
-
Yen-Ting Lin authored
* add taiwan truthful qa * add tmlu * Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script * add pega eval and legal eval * add ccp eval * Update .gitignore and harness_eval.slurm * Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script * Add Pega MMLU task and configuration files * Add new models and update parameters in run_all.sh * Add UMTCEval tasks and configurations * Update dataset paths and output path * Update .gitignore and harness_eval.slurm, and modify _generate_configs.py * Update SLURM script and add new models * clean for pr * Update lm_eval/tasks/tmlu/default/tmlu.yaml Co-authored-by:
Lintang Sutawika <lintang@sutawika.com> * adjust tag name * removed group alias from tasks * format --------- Co-authored-by:
Lintang Sutawika <lintang@sutawika.com> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
Yen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>
-
Uminosachi authored
-
am-bean authored
* Setting up lingoly task * Testing yaml changes to debug * Adding pre-commit hooks * Functional LingOly benchmark * Renaming files and adding grouping * Extending group aggregations to allow custom functions. Setting up custom lingoly aggregation using difference in scores. * Adding LingOly to the README file
-
- 16 Aug, 2024 1 commit
-
-
Cameron7195 authored
* Created a new task for gsm8k which corresponds to the cot settings and prompt formatting described by Meta to evaluate Llama. Useful for replicating Llama performance on GSM8K benchmark. * fixing formatting * fixing formatting
-
- 15 Aug, 2024 2 commits
-
-
am-bean authored
* Setting up lingoly task * Testing yaml changes to debug * Adding pre-commit hooks * Functional LingOly benchmark * Renaming files and adding grouping * Extending group aggregations to allow custom functions. Setting up custom lingoly aggregation using difference in scores.
-
Anton Polishko authored
Bumped citation to the v0.4.3
-
- 10 Aug, 2024 1 commit
-
-
Yu Shi Jie authored
-
- 09 Aug, 2024 1 commit
-
-
Jungwhan Kim authored
* add keep trailing newline * apply ruff-format * add prompt unit test * increment the version of tasks that have description with whitespace * remove white spaces of leaderboard bbh * update MMLU expected versions in output * CI run does display the expected version=1 for mmlu subtasks, fix expected test output again --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
- 07 Aug, 2024 1 commit
-
-
Yu Shi Jie authored
* fixed gsm * GSM-Plus: remove dataset_name line
-
- 05 Aug, 2024 8 commits
-
-
Hailey Schoelkopf authored
-
Hailey Schoelkopf authored
-
Yu Shi Jie authored
* added gsm_plus * formatted dataset to have train-test-splits * README.md for gsm-plus * Update README.md * GSM-Plus: added gsm_plus_mini * GSM-Plus: attribution to original dataset * Update README.md * Update README.md * Update README.md --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
Yu Shi Jie authored
* initialized mmlu_pro task * added generative mmlu-pro * added cot fewshot for mmlu-pro * Initial commit * updated mmlu-pro to take on 3 splits: test, val, dev * mmlu-pro: added continuation and flan_cot_zeroshot * added README.md for mmlu_pro * removed * update files * moved files out, and removed unused versions * updated * mmlu_pro: -changed task 'other' to 'miscellaneous' there is already a group named 'other' task and group with the same alias (e.g. mmlu_pro_other_generative) throws an error -fixed yaml backslash escape for fewshot cot * changed choices -> options in yaml config to fit dataset schema * ONLY FOR DEFAULT: fixed yaml file to use variable number of choices * mmlu-pro: fixed doc_to_text/choice/target configs for all variants * mmlu-pro: minor fixes * mmlu-pro/default: aligned with mmlu updates * mmlu-pro: update yaml content in line with mmlu * mmlu-pro: fixed mislabelling of task (math->chemistry) * mmlu-pro: fixed yaml formatting * add custom fewshot doc_to_text, target, and choice * add process for each subtask * add process for each subtask * pre-commit * pre-commit * format * resolved left out merge * deleted folders + updated readme * Update evaluator.py * Update evaluator.py --------- Co-authored-by:
Yu Shi Jie <shijie@tensorplex.ai> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
root <root@455bdd73-01.cloud.together.ai> Co-authored-by:
Lintang Sutawika <lintang@sutawika.com>
-
Hailey Schoelkopf authored
-
Amir Hossein Kargaran authored
-
Baber Abbasi authored
-
Nathan Habib authored
* batch commit * :Revert "batch commit" This reverts commit d859d1ca . * batch commit * checkout from main * checkout from main * checkout from main * checkout from main * checkout from main * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * cleanup * linting * add doc * Update lm_eval/models/huggingface.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/huggingface.py * linter * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * style * remove prepare * fix * style * last check * Update lm_eval/models/huggingface.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> --------- Co-authored-by:
Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
clementine@huggingface.co <clementine@huggingface.co>
-
- 04 Aug, 2024 2 commits
-
-
zhabuye authored
-
Amir Hossein Kargaran authored
-
- 01 Aug, 2024 3 commits
-
-
Hailey Schoelkopf authored
-
Nathan Weinberg authored
* refactor: move scipy and sklearn module imports to func imports Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * refactor: consolidate weighted_f1_score func into lm_eval utils Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * lint: allow for utils file to have unused imports this allows for shared functions to be defined only once while allowing for the YAML function importing to continue working Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> --------- Signed-off-by:
Nathan Weinberg <nweinber@redhat.com>
-
Baber Abbasi authored
* add temperature for log probs * add seed * nit * add new args to test * added warning for api chat models
-
- 29 Jul, 2024 1 commit
-
-
Baber Abbasi authored
* encoding bugfix * encoding bugfix * overload logliklehood rather than loglikehood_tokens * add custom tokenizer * add docs * Update API_guide.md fix link; add note * Update API_guide.md typo * pre-commit * add link in readme * nit * nit * nit * Update API_guide.md nits * Update API_guide.md * Update API_guide.md * Update API_guide.md * Update API_guide.md * Update README.md * Update docs/API_guide.md * Update docs/API_guide.md * Update API_guide.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 22 Jul, 2024 1 commit
-
-
Baber Abbasi authored
* refactor pad_token handling to fn * fix docs * add pad_token_handling to vllm * start on API superclass * don't detokenize the returned logits * streamline vllm tokenizer * add type hint * pre-commit * seems to be in working order * add model to init * refactor api models * nit * cleanup * add pbar * fix type hints * change optional dependencies * json encode chat template * add type hints * deal with different prompt input requiremnts * nits * fix * cache inside async * fix * fix * nits * nits * nits * nit * fixup * fixup * nit * add dummy retry * add dummy retry * handle imports; skip failing test * add type hint * add tests * add dependency to tests * add package names to exception * nit * docs; type hints * handle api key * nit * tokenizer bug * fix tokenizer * nit * nit * add better error messages * nit * remove decorator * CI: install api dep * revert evaluator.py * consolidate * consolidate * nits * nit * fix typealias * nit * nit * nit * Update lm_eval/models/api_models.py typo Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/openai_completions.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/anthropic_llms.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/api_models.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fix typo * add news section * add info for API * pre-commit * typo * fix bug: unpack logliklehood requests * fix bug: shared gen_kwargs mutated * nit: handle copy properly * Update README.md * Update README.md * Update README.md * Update api_models.py * Update README.md --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 21 Jul, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 20 Jul, 2024 1 commit
-
-
Jennifer Cwagenberg authored
-
- 18 Jul, 2024 2 commits
-
-
Nathan Weinberg authored
Signed-off-by:Nathan Weinberg <nweinber@redhat.com>
-
Jungwhan Kim authored
-
- 17 Jul, 2024 1 commit
-
-
jab13x authored
-
- 15 Jul, 2024 3 commits
-
-
Nathan Weinberg authored
Also add 'test_logs/' to .gitignore Signed-off-by:Nathan Weinberg <nweinber@redhat.com>
-
Lintang Sutawika authored
-
Hailey Schoelkopf authored
-
- 14 Jul, 2024 1 commit
-
-
Ben Shoham Ofir authored
* Added MedConceptsQA Benchmark * pre-commit factor * update group name * update in naming * changed name * Changed mcqa to med_concepts_qa prefix * Added med_concepts_qa to README.md * Changed config files according the new format * Updated README --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
- 13 Jul, 2024 1 commit
-
-
Nathan Weinberg authored
Signed-off-by:Nathan Weinberg <nweinber@redhat.com>
-