- 06 May, 2024 1 commit
-
-
haileyschoelkopf authored
-
- 27 Apr, 2024 1 commit
-
-
LucWeber authored
-
- 12 Mar, 2024 1 commit
-
-
LucWeber authored
-
- 11 Mar, 2024 2 commits
- 08 Mar, 2024 2 commits
- 06 Mar, 2024 7 commits
-
-
Sungho Park authored
Update installation commands in openai_completions.py and contributing document and, update wandb_args description (#1536) * Update openai completions and docs/CONTRIBUTING.md * Update wandb args description * Update docs/interface.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
LSinev authored
* Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided) * Fix improper import of LM and usage of evaluator in one of scripts * update type hints in instance and task api * raising errors in task.py instead of asserts * Fix warnings from ruff * raising errors in __main__.py instead of asserts * raising errors in tasks/__init__.py instead of asserts * raising errors in evaluator.py instead of asserts * evaluator: update type hints and remove unused variables in code * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/evaluator.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit induced fixes --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
update printed num-fewshot ; prevent fewshots from erroneously being used by cot which hardcodes fewshot prompt (#1502)
-
Hailey Schoelkopf authored
-
sean0042 authored
-
Long Phan authored
* init wmdp yaml file * Add WMDP Multiple-choice * fix linter issues * Delete lm_eval/tasks/wmdp/_wmdp.yaml --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-
Peter Bevan authored
* Start adding eq-bench * Start adding to yaml and utils * Get metric working * Add README * Handle cases where answer is not parseable * Deal with unparseable answers and add percent_parseable metric * Update README
-
- 05 Mar, 2024 2 commits
-
-
Uanu authored
* Add new tasks of GPQA * Add README * Remove unused functions * Remove unused functions * Linters * Add flexible match * update * Remove deplicate function * Linter * update * Update lm_eval/filters/extraction.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * register multi_choice_regex * Update * run precommit --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
Baber Abbasi authored
-
- 04 Mar, 2024 4 commits
-
-
Hailey Schoelkopf authored
* Fix padding * Fix elif in model loading * format
-
Hailey Schoelkopf authored
-
Manuel Faysse authored
* add french-bench * rename arc easy * linting * update datasets for no remote code exec * fix string delimiter * add info to readmr * trim trailing whitespace * add detailed groups * add info to readme * remove orangesum title from fbench main * Force PPL tasks to be 0-shot --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Vicki Boykis authored
-
- 03 Mar, 2024 2 commits
-
-
Vicki Boykis authored
* setting trust_remote_code * dataset list no notebooks * respect trust remote code * Address changes, move cli options and change datasets * fix task for tests * headqa * remove kobest * pin datasets and address comments * clean up space
-
Baber Abbasi authored
* use `@ray.remote` with distributed vLLM * update versions * bugfix * unpin vllm * fix pre-commit * added version assertion error * Revert "added version assertion error" This reverts commit 8041e9b78e95eea9f4f4d0dc260115ba8698e9cc. * added version assertion for DP * expand DP note * add warning * nit * pin vllm * fix typos
-
- 01 Mar, 2024 4 commits
-
-
Baber Abbasi authored
* make `WandbLogger` init args optional * nit * nit * nit * move import warning to `WandbLogger` * nit * update docs * nit
-
Hailey Schoelkopf authored
* add undistribute + use more_itertools * remove divide() util fn * add more_itertools as dependency
-
Hailey Schoelkopf authored
-
Zehan Li authored
-
- 28 Feb, 2024 1 commit
-
-
Linsong Chu authored
-
- 27 Feb, 2024 4 commits
-
-
Rich authored
* model_type attribute error Getting attribute error when using a model without a 'model_type' * fix w/ and w/out the 'model_type' specification * use getattr(), also fix other config.model_type reference * Update huggingface.py --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
-
Zehan Li authored
-
Baber Abbasi authored
* change `all_gather` to `gather` * add TaskOutput utility class * Add FilterResults class and refactor task handling. * Rename `key` to `filter_key` for clarity * Add `print_writeout` function in utils.py * Add function to calculate limit size. * Add doc_iterator method to Task class * Refactor `doc_iterator` and cleanup in Task class * remove superfluous bits * change `all_gather` to `gather` * bugfix * bugfix * fix `gather` * Refactor `gather` loop * Refactor aggregate metrics calculation * Refactor and simplify aggregate metrics calculation Removed unused code * Simplify metrics calculation and remove unused code. * simplify the metrics calculation in `utils.py` and `evaluator.py`. * Fix group metric * change evaluate to hf_evaluate * change evaluate to hf_evaluate * add docs * add docs * nits * make isslice keyword only * nit * add todo * nit * nit * nit: swap order samples_metrics tuple * move instance sorting outside loop * nit * nit * Add __repr__ for ConfigurableTask * nit * nit * Revert "nit" This reverts commit dab8d9977a643752a17f840fd8cf7e4b107df28f. * fix some logging * nit * fix `predict_only` bug. thanks to `@LSinev`! * change `print_tasks` to `prepare_print_tasks` * nits * move eval utils * move eval utils * nit * add comment * added tqdm descriptions * Update lm_eval/evaluator_utils.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fix mgsm bug * nit * fix `build_all_requests` * pre-commit * add ceil to limit --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 26 Feb, 2024 7 commits
-
-
Lintang Sutawika authored
* add brier_score * process brier_score * brier score is working for N-sized class * fxied brier score * add TED to BigBench and Brier score to MMLU * format * Update metrics.py * Update task.py * Update generate_until_template_yaml * Delete lm_eval/tasks/bigbench/aux_metric.py * Update generate_until_template_yaml * Update _default_template_yaml * Update _generate_configs.py * Update _generate_configs.py * Update _generate_configs.py * fix (format?) * format? * format, once more --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Aaron V authored
* Create a means for caching task registration and request building. Add the ability to specify an args dict for simple_evaluate(). * Remove extra S in cache path in caching module Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Rename requests cache args, make model_args polymorphic so that a dict can also be accepted. * Update docs to reflect new caching behavior, add CLI args for requests caching. Create a function for deleting items in the cache. * Update documentation, fix minor bug with arg parsing for requests caching where an undefined variable was used. * Remove line from gitignore, add to cli for caching datasets. * Add hashing suffix to .pickles. Update test script typo. * Favor isinstance() over type() in evaluator.py * Add tests for caching, gets tests working, remove unneeded arg from build_all_requests(). * Update arg description to simple_evaluate. * Update pyproject.toml * Fix typehint * Remove the use of random() for creating default cache pickle hash. * Check that cache dir exists before clearing it in request cache tests. * Fix linting problems. * Fix additional formatting errors. * Remove trailing whitespace. * Add new line to the end of .gitignore. --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
This reverts commit c1145dfd.
-
Hailey Schoelkopf authored
* add add_bos_token to HFLM * add BOS token flag to other local model classes --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
khalil authored
* add arabic mmlu * update the description * add readme file
-
Vicki Boykis authored
-
LSinev authored
-
- 24 Feb, 2024 1 commit
-
-
LSinev authored
* Save git_hash to results even if git is not available to call as subprocess * Store more info about environment and transformers version in results to help researchers track inconsistencies * moved added logging to logging_utils * moved get_git_commit_hash to logging_utils.py * moved add_env_info inside evaluator
-
- 23 Feb, 2024 1 commit
-
-
Vicki Boykis authored
* interface docs * fix link
-