"test/git@developer.sourcefind.cn:gaoqiong/yaml-cpp.git" did not exist on "16fd111a6b4b5e75721ed3b363cb6d4913f98df3"
- 18 Sep, 2024 2 commits
-
-
Stella Biderman authored
-
Stella Biderman authored
-
- 17 Sep, 2024 2 commits
-
-
Baber Abbasi authored
-
SYusupov authored
* Update README.md I encounter some Git buffer size limits when trying to download all commits history of the repository, such as: ```error: RPC failed; curl 18 transfer closed with outstanding read data remaining error: 5815 bytes of body are still expected fetch-pack: unexpected disconnect while reading sideband packet fatal: early EOF``` therefore the installation is faster and there are not errors when I download only the last version of the repository * Fix linting issue
-
- 13 Sep, 2024 1 commit
-
-
Lintang Sutawika authored
* add WIP hf vlm class * add doc_to_image * add mmmu tasks * fix merge conflicts * add lintang's changes to hf_vlms.py * fix doc_to_image * added yaml_path for config-loading * revert * add line to process str type v * update * modeling cleanup * add aggregation for mmmu * rewrite MMMU processing code based on only MMMU authors' repo (doc_to_image still WIP) * implemented doc_to_image * update doc_to_image to accept list of features * update functions * readd image processed * update args process * bugfix for repeated images fed to model * push WIP loglikelihood code * commit most recent code (generative ; qwen2-vl testing) * preliminary image_token_id handling * small mmmu update: some qs have >4 mcqa options * push updated modeling code * use processor.apply_chat_template * add mathvista draft * nit * nit * ensure no footguns in text<>multimodal LM<>task incompatibility * add notification to readme regarding launch of prototype! * fix compatibility check * reorganize mmmu configs * chat_template=None * add interleave chat_template * add condition * add max_images; interleave=true * nit * testmini_mcq * nit * pass image string; convert img * add vllm * add init * vlm add multi attr * fixup * pass max images to vllm model init * nit * encoding to device * fix HFMultimodalLM.chat_template ? * add mmmu readme * remove erroneous prints * use HFMultimodalLM.chat_template ; restore tasks/__init__.py * add docstring for replace_placeholders in utils * fix `replace_placeholders`; set image_string=None * fix typo * cleanup + fix merge conflicts * update MMMU readme * del mathvista * add some sample scores * Update README.md * add log msg for image_string value --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai> Co-authored-by:
Baber Abbasi <baber@eleuther.ai> Co-authored-by:
Baber <baber@hey.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 10 Sep, 2024 1 commit
-
-
Malikeh Ehghaghi authored
* arabic leaferboard yaml file is added * arabic toxigen is implemented * Dataset library is imported * arabic sciq is added * util file of arabic toxigen is updated * arabic race is added * arabic piqa is implemented * arabic open qa is added * arabic copa is implemented * arabic boolq ia added * arabic arc easy is added * arabic arc challenge is added * arabic exams benchmark is implemented * arabic hellaswag is added * arabic leaderboard yaml file metrics are updated * arabic mmlu benchmarks are added * arabic mmlu group yaml file is updated * alghafa benchmarks are added * acva benchmarks are added * acva utils.py is updated * light version of arabic leaderboard benchmarks are added * bugs fixed * bug fixed * bug fixed * bug fixed * bug fixed * bug fixed * library import bug is fixed * doc to target updated * bash file is deleted * results folder is deleted * leaderboard groups are added * full arabic leaderboard groups are added, plus some bug fixes to the light version * Create README.md README.md for arabic_leaderboard_complete * Create README.md README.md for arabic_leaderboard_light * Delete lm_eval/tasks/arabic_leaderboard directory * Update README.md * Update README.md adding the Arabic leaderboards to the library * Update README.md 10% of the training set * Update README.md 10% of the training set * revert .gitignore to prev version * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated main README.md * Update lm_eval/tasks/README.md * specify machine translated benchmarks (complete) * specify machine translated benchmarks (light version) * add alghafa to the related task names (complete and light) * add 'acva' to the related task names (complete and light) * add 'arabic_leaderboard' to all the groups (complete and light) * all dataset - not a random sample * added more accurate details to the readme file * added mt_mmlu from okapi * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/tasks/README.md Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * updated mt_mmlu readme * renaming 'alghafa' full and light * renaming 'arabic_mmlu' light and full * renaming 'acva' full and light * update readme and standardize dir/file names * running pre-commit --------- Co-authored-by:
shahrzads <sayehban@ualberta.ca> Co-authored-by:
shahrzads <56282669+shahrzads@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 05 Sep, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 04 Sep, 2024 1 commit
-
-
Baber Abbasi authored
* default chat template method fix * move chat_template to TemplateLM * remove hotfix * handle openai `chat_template` * Update lm_eval/api/model.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * add 'max_tokens' to gen_kwargs * pre-commit --------- Co-authored-by:
KonradSzafer <szafer.konrad@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 30 Aug, 2024 2 commits
-
-
Baber Abbasi authored
* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods * Revert "fixup! max_length should be handled for logliklihoods" This reverts commit 432d1a3b754c117c3a54ea2fe792ab3a1bd09ed3.
-
Baber Abbasi authored
* max_length - 1 (generation always >= 1) * vllm: fix rolling prefix_token * nit: add comment * fixup! max_length should be handled for logliklihoods
-
- 28 Aug, 2024 3 commits
-
-
Hailey Schoelkopf authored
* fix revision type * allow for None-input loglikelihood reqs to be cached * handle no remaining cache items * pre-commit * change cache_hook.add_partial(loglikelihood_rolling...) convention --------- Co-authored-by:Baber Abbasi <baber@eleuther.ai>
-
Hailey Schoelkopf authored
-
Hailey Schoelkopf authored
* Update evaluator.py * update error msg
-
- 25 Aug, 2024 1 commit
-
-
Baber Abbasi authored
* chat template hotfix * pre-commit
-
- 23 Aug, 2024 3 commits
-
-
Cameron Witkowski authored
* Created DUP eval code for gsm8k * asdiv * Fixed fewshot=8 issue * added results to .gitignore * reverted unnecessary changes and moved results + gsm8k_dup out of repo to prepare for pull req * fixed whitespace and unintentional hardcoded version change information * created mbpp task * Reverted changes re. mbpp to save for a future Pull req * reverted metrics.py to previous commit * updated asdiv readme to include informaiton about new asdiv_cot_llama task * Apply suggestions from code review --------- Co-authored-by:
Alexander Detkov <alexander.d.detkov@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
eyuansu62 authored
-
LSinev authored
ACLUE bibtex typo reported to ACL Anthology and fixed here as title in pdf is correct.
-
- 22 Aug, 2024 3 commits
-
-
Baber Abbasi authored
-
Wessel Poelman authored
-
lxning authored
* fix the regex string in yaml file * Update samplers.py --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
- 20 Aug, 2024 6 commits
-
-
Geralt authored
* mela * Update mela_en.yaml * Create _mela.yaml --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
Nam D. Tran authored
* fix: arguments data * fix based on comment * Update zeno_visualize.py updated all output types --------- Co-authored-by:Baber Abbasi <92168766+baberabb@users.noreply.github.com>
-
Hailey Schoelkopf authored
-
KonradSzafer authored
* multiple chat template support * help doc update * add transformers link to docstring * model args update * comment update * statement simplification * simplified chat_template property * docs update * removed template arg from HFLM class * interface doc update * model guide update * interface doc update * reuse apply_chat_template variable * model guide refactor * interface doc update * removed old definition * last nits * last nits * last nits * better wording * last nits * Remove unnecessary Optional * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * return variable rename --------- Co-authored-by:
Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Nathan Habib authored
-
lewtun authored
* Update IFEval dataset to official one This PR updates the IFEval dataset to the one hosted under the Google org: https://huggingface.co/datasets/google/IFEval Note the main change is an updated prompt from this commit in the GitHub repo: https://github.com/google-research/google-research/commit/26d8ccdab6fec61b5c83ad6327ea8bda9e580288 * Update ifeval.yaml --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 19 Aug, 2024 3 commits
-
-
Yen-Ting Lin authored
* add taiwan truthful qa * add tmlu * Add .gitignore entries for evals/ and harness_eval_main_log.txt, and add harness_eval.slurm script * add pega eval and legal eval * add ccp eval * Update .gitignore and harness_eval.slurm * Add trust_remote_code and wandb_args to harness_eval.slurm, and add run_all.sh script * Add Pega MMLU task and configuration files * Add new models and update parameters in run_all.sh * Add UMTCEval tasks and configurations * Update dataset paths and output path * Update .gitignore and harness_eval.slurm, and modify _generate_configs.py * Update SLURM script and add new models * clean for pr * Update lm_eval/tasks/tmlu/default/tmlu.yaml Co-authored-by:
Lintang Sutawika <lintang@sutawika.com> * adjust tag name * removed group alias from tasks * format --------- Co-authored-by:
Lintang Sutawika <lintang@sutawika.com> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
Yen-Ting Adam, Lin <r08944064@csie.ntu.edu.tw>
-
Uminosachi authored
-
am-bean authored
* Setting up lingoly task * Testing yaml changes to debug * Adding pre-commit hooks * Functional LingOly benchmark * Renaming files and adding grouping * Extending group aggregations to allow custom functions. Setting up custom lingoly aggregation using difference in scores. * Adding LingOly to the README file
-
- 16 Aug, 2024 1 commit
-
-
Cameron7195 authored
* Created a new task for gsm8k which corresponds to the cot settings and prompt formatting described by Meta to evaluate Llama. Useful for replicating Llama performance on GSM8K benchmark. * fixing formatting * fixing formatting
-
- 15 Aug, 2024 2 commits
-
-
am-bean authored
* Setting up lingoly task * Testing yaml changes to debug * Adding pre-commit hooks * Functional LingOly benchmark * Renaming files and adding grouping * Extending group aggregations to allow custom functions. Setting up custom lingoly aggregation using difference in scores.
-
Anton Polishko authored
Bumped citation to the v0.4.3
-
- 10 Aug, 2024 1 commit
-
-
Yu Shi Jie authored
-
- 09 Aug, 2024 1 commit
-
-
Jungwhan Kim authored
* add keep trailing newline * apply ruff-format * add prompt unit test * increment the version of tasks that have description with whitespace * remove white spaces of leaderboard bbh * update MMLU expected versions in output * CI run does display the expected version=1 for mmlu subtasks, fix expected test output again --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
- 07 Aug, 2024 1 commit
-
-
Yu Shi Jie authored
* fixed gsm * GSM-Plus: remove dataset_name line
-
- 05 Aug, 2024 5 commits
-
-
Hailey Schoelkopf authored
-
Hailey Schoelkopf authored
-
Yu Shi Jie authored
* added gsm_plus * formatted dataset to have train-test-splits * README.md for gsm-plus * Update README.md * GSM-Plus: added gsm_plus_mini * GSM-Plus: attribution to original dataset * Update README.md * Update README.md * Update README.md --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
Yu Shi Jie authored
* initialized mmlu_pro task * added generative mmlu-pro * added cot fewshot for mmlu-pro * Initial commit * updated mmlu-pro to take on 3 splits: test, val, dev * mmlu-pro: added continuation and flan_cot_zeroshot * added README.md for mmlu_pro * removed * update files * moved files out, and removed unused versions * updated * mmlu_pro: -changed task 'other' to 'miscellaneous' there is already a group named 'other' task and group with the same alias (e.g. mmlu_pro_other_generative) throws an error -fixed yaml backslash escape for fewshot cot * changed choices -> options in yaml config to fit dataset schema * ONLY FOR DEFAULT: fixed yaml file to use variable number of choices * mmlu-pro: fixed doc_to_text/choice/target configs for all variants * mmlu-pro: minor fixes * mmlu-pro/default: aligned with mmlu updates * mmlu-pro: update yaml content in line with mmlu * mmlu-pro: fixed mislabelling of task (math->chemistry) * mmlu-pro: fixed yaml formatting * add custom fewshot doc_to_text, target, and choice * add process for each subtask * add process for each subtask * pre-commit * pre-commit * format * resolved left out merge * deleted folders + updated readme * Update evaluator.py * Update evaluator.py --------- Co-authored-by:
Yu Shi Jie <shijie@tensorplex.ai> Co-authored-by:
lintangsutawika <lintang@eleuther.ai> Co-authored-by:
root <root@455bdd73-01.cloud.together.ai> Co-authored-by:
Lintang Sutawika <lintang@sutawika.com>
-
Hailey Schoelkopf authored
-