- 16 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 15 Jan, 2024 7 commits
-
-
Hailey Schoelkopf authored
Bumping CITATION.bib to match re-adding the citation in readme. cc @StellaAthena
-
Stella Biderman authored
It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.
-
Lintang Sutawika authored
* rewor documentation for explaining local dataset * fix typo * Update new_task_guide.md
-
Hailey Schoelkopf authored
* add WIP device_map overrides * update handling outside of accelerate launcher * change .to(device) log to debug level * run linter
-
Lintang Sutawika authored
* benchmark yamls allow minor edits of already registered tasks * add documentation * removed print
-
Hailey Schoelkopf authored
* Make parallelize=True distinction clearer in documentation. * run linter
-
Hailey Schoelkopf authored
-
- 12 Jan, 2024 3 commits
-
-
Hailey Schoelkopf authored
-
jp authored
* Add: kobest config file * Add: kobest utils * Add: README * Update utils.py
-
Hailey Schoelkopf authored
-
- 11 Jan, 2024 3 commits
-
-
Stella Biderman authored
-
Hailey Schoelkopf authored
* fix incorrect lookback protections * bump generate_until task versions
-
Tanishq Abraham authored
* multimedqa * Update medqa.yaml * move to benchmarks folder * add README.md --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-
- 10 Jan, 2024 3 commits
-
-
Baber Abbasi authored
* Refine scoring logic for multiple_target "exact_match" metric * skip old tests from master * skip old tests from master * delete tests from master
-
James A. Michaelov authored
-
Baber Abbasi authored
-
- 08 Jan, 2024 2 commits
-
-
Stella Biderman authored
Over a dozen papers have used the updated citation block, but Google Scholar has noticed none of them. Since it does understand this citation, I think we should use it going forward until we have a way to ensure the newer citations are actually logged.
-
Lintang Sutawika authored
-
- 05 Jan, 2024 2 commits
-
-
Sam Passaglia authored
* do not ensure ascii * Update __main__.py --------- Co-authored-by:Lintang Sutawika <lintang@sutawika.com>
-
JorgeDeCorte authored
* add hellaswag_nl * add other languages and update readme to hellaswag * refactor as new task * update readme * add endline to yaml files and readme.md * add group, change folder location and update yaml file * rename default hellaswag yaml file * fix whitespace error in some labels * downgrade log level of whitespace checking --------- Co-authored-by:
JorgeDeCorte <jorge.decorte@ravago.be> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 04 Jan, 2024 2 commits
-
-
Lintang Sutawika authored
* Remove self.dataset_path post_init process * Update task.py * Update task.py
-
Baber Abbasi authored
* copies max_length from huggingface * handle max_length properly * get tokens from inputs * substitute Collator for Reorderer * `batch=auto` if using data_parallel * nit * cleanup * update code comments * `ray.shutdown()` after calling method if data_parallel_size > 1 --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 02 Jan, 2024 3 commits
-
-
Stella Biderman authored
-
Baber Abbasi authored
* auto-batch requires len of iter * handle case when batch_size="auto:N"
-
Pasquale Minervini authored
-
- 30 Dec, 2023 1 commit
-
-
Anjor Kanekar authored
-
- 29 Dec, 2023 1 commit
-
-
Paul McCann authored
* Add example failing task This task includes an invalid import. This will cause an exception and the task will not be loaded. But this just results in a DEBUG level log message, so in normal usage you'll see no error, and will be told the task doesn't exist. Here's an example command line to run the task: python -m lm_eval --model hf --model_args pretrained=rinna/japanese-gpt-1b --tasks fail This task is based on a Japanese Winograd task, but that's not important, and was just used due to familiarity. * Do not ignore errors when loading tasks * Change how task errors are logged This makes the proposed changes from PR discussion. 1. Exceptions not related to missing modules/imports are logged as warnings. 2. module/import related exceptions are still logged at debug level, but if any of them happen there is a warning about it with instructions on how to show logs. * Remove intentionally failing task --------- Co-authored-by:Paul O'Leary McCann <polm@dampfkraft.com>
-
- 28 Dec, 2023 1 commit
-
-
Alex Bäuerle authored
-
- 27 Dec, 2023 2 commits
-
-
Baber Abbasi authored
* fix group * siqa: default.yml -> default.yaml * max_gen_toks -> self.max_gen_toks * add ids to task tests * fix siqa * fix gen_kwargs for openai-chat
-
Jaewoo Yang authored
-
- 25 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
-
- 24 Dec, 2023 2 commits
-
-
Yuliang Li authored
-
MorishT authored
* Add remove_whitespace to FLD benchmark * bump task version --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 23 Dec, 2023 2 commits
-
-
Baber Abbasi authored
* refactor dataloader * cleanup + add docs * change arg * renamed Collator and added testing * parametrized test for Collator * appease pre-commit * added edge case batch 0 (no batching) * fix typos --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
-
- 22 Dec, 2023 4 commits
-
-
-
Hailey Schoelkopf authored
* modularize HFLM code * pass through extra kwargs to AutoModel.from_pretrained call * remove explicit model_kwargs * rename gptq -> autogptq * fix tokenizer pad token errors * ensure model always respects device_map and autogptq's selected devices * add a _get_config helper fn * add mambaLMWrapper * add mamba extra * add mamba extra * fix conditional import * Fix botched merge commit * Remove beginning-of-file comment for consistency * Add docstring for mambaLM re: supported kwargs * Alphabetize extras * Update extras table * appease precommit * run precommit on mamba_lm
-
Hailey Schoelkopf authored
-
Bram Vanroy authored
-