- 30 May, 2024 1 commit
-
-
haileyschoelkopf authored
-
- 29 May, 2024 1 commit
-
-
haileyschoelkopf authored
-
- 28 May, 2024 1 commit
-
-
Michael Goin authored
* Reorder vllm imports in vllm_causallms.py * Update vllm_causallms.py
-
- 26 May, 2024 1 commit
-
-
Hailey Schoelkopf authored
* rename lm_eval.logging module * fix evaluation tracker args
-
- 24 May, 2024 7 commits
-
-
Lintang Sutawika authored
* edit process multiple-choice * split template yaml * remove * modified multiple_choice tasks * udpate * Update multiple_choice_template_b_yaml * Update multiple_choice_template_a_yaml --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Lintang Sutawika authored
* add mmlu tasks from pile-t5 * Update _mmlu_flan_cot_fewshot_template_yaml * Update _mmlu_flan_cot_zeroshot_template_yaml * Update _mmlu_flan_generative_template_yaml * Update _mmlu_flan_loglikelihood_template_yaml * Update _default_template_yaml --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
* add handling for bootstrap_iters=0 case * add more detail to docstring * run precommit
-
Lintang Sutawika authored
`gold_one_hot` needs to follow the dimension of predictions so that it still works when `--limit` is used and the indexes in gold does not cover all gold indexes.
-
Hailey Schoelkopf authored
* fix auto-batch size bug for seq2seq models * run linter
-
Huazhong Ji authored
-
DongGeon Lee authored
-
- 23 May, 2024 1 commit
-
-
Edward Gan authored
-
- 22 May, 2024 1 commit
-
-
zhabuye authored
-
- 21 May, 2024 2 commits
-
-
Zafir Stojanovski authored
-
Zafir Stojanovski authored
-
- 19 May, 2024 1 commit
-
-
Nick Doiron authored
* resize model embeddings * resize only * tokenizer help * load tokenizer before model * add comment and run precommit lint * Add log message Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 14 May, 2024 1 commit
-
-
LSinev authored
-
- 13 May, 2024 2 commits
-
-
KonradSzafer authored
-
Lucas Weber authored
* Add tinyBenchmarks * Add acknowledgements * Add ordering of outputs for data-parallel * Run pre-commit * Add few_shot specifications * Add tinyBenchmarks post-processing * add conditional import ; fix task names --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
- 09 May, 2024 1 commit
-
-
Edd authored
* add copal * change name to copal id for clarity and the task name * remove `copal_id...` to yaml to make it work * checkmark on README * change group name to `copal_id`
-
- 08 May, 2024 2 commits
-
-
aditya thomas authored
* update interface documentation with flag --hf_hub_logs_arg * update interface documentation with flag --hf_hub_logs_arg 2
-
jonabur authored
* add mmlu arc style evaluation * rename arc_style to continuation --------- Co-authored-by:
Jonathan Burdge <jburdge@mahti-login11.mahti.csc.fi> Co-authored-by:
Jonathan Burdge <jburdge@mahti-login12.mahti.csc.fi>
-
- 07 May, 2024 5 commits
-
-
Yoav Katz authored
* Initial support for Unitxt datasets in LM Eval Harness See https://github.com/IBM/unitxt The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file. The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'. * Added dataset loading check to generate_yaml Improved error messages. * Speed up generate_yaml Added printouts and improved error message * Added output printout * Simplified integration of unitxt datasets Store all the common yaml configuration in a yaml include shared by all datasets of the same task. * Post code review comments - part 1 1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks 2. Added more datasets and tasks (NER, GEC) 3. Added README * Post code review comments - part 2 1. Added install unitxt install option in pyproject.toml: pip install 'lm_eval[unitxt]' 2. Added a check that unitxt is installed and print a clear error message if not * Commited missing pyproject change * Added documentation on adding datasets * More doc changes * add unitxt extra to readme * run precommit --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Hailey Schoelkopf authored
* fix auto-batch size bug for seq2seq models * alphabetize task + group tables ; fix eval tracker bug * fix eval tracker bug
-
Hailey Schoelkopf authored
* add Hendrycks MATH (no sympy checking) variant * add readmes for MATH tasks
-
KonradSzafer authored
-
Hailey Schoelkopf authored
-
- 06 May, 2024 2 commits
-
-
aditya thomas authored
-
LSinev authored
* Added fewshot sampling seeds to evaluator.simple_evaluate signature Way to control seed of fewshot sampling may help with #1591 * Added ability for custom sampler for ConfigurableTask May be set in config like ``` fewshot_config: sampler: !function utils.MyFewshotSampler ``` * explicitly set fewshot random generator seed for HFLM generate_until_task test * add backward compatibility for three args seed setup * save seeds info to logs/reports
-
- 05 May, 2024 4 commits
-
-
ciaranby authored
-
Muhammad Bin Usman authored
fix `----hf_hub_log_args` to `--hf_hub_log_args`
-
kwrobel.eth authored
* remove echo parameter in OpenAI completions API * remove context length parameter doc string
-
KonradSzafer authored
-
- 03 May, 2024 2 commits
-
-
KonradSzafer authored
-
KonradSzafer authored
* evaluation tracker implementation * OVModelForCausalLM test fix * typo fix * moved methods args * multiple args in one flag * loggers moved to dedicated dir * improved filename sanitization
-
- 02 May, 2024 2 commits
-
-
Helena Kloosterman authored
* Add option to set OpenVINO config * Use utils.eval_logger for logging
-
bcicc authored
* vllm lora support * remove print * version check, rename lora kwarg
-
- 01 May, 2024 3 commits
-
-
Simran Arora authored
* upload new tasks * add readmes * run linters --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
Zehan Li authored
* Update utils.py This is a 4-choice task, option_e is null for all but 3 samples * Fix options Adaptive choices * add option e * bump multilingual arc version --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Gabriel Mukobi authored
* Add Pile-10k readme * Add Pile-10k task configuration file
-