- 03 Jun, 2024 1 commit
-
-
anthony-dipofi authored
* added tasks and task family descriptors * continue work on task list w/ links; slightly reorganize README * Apply suggestions from code review * Rename file so that it'll preview in Github when viewing lm_eval/tasks folder * Update new_task_guide.md * Update README.md * run linter * Add language column to task table; Add missing tasks to task table; fix nq_open and storycloze READMEs * fix typo * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * apply format --------- Co-authored-by:
Harish Vadaparty <harishvadaparty@gmail.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 31 May, 2024 1 commit
-
-
KonradSzafer authored
* dataset card initial * few fixes * adds groups for math, mmlu, gpqa * added summary agrs * moved sanitize_list to utils * readme update * recreate metadata moved * multiple model support * results latest split fix * readme update and small refactor * fix grouping * add comments * added pathlib * corrected pathlib approach * check whether to create a metadata card * convert posix paths to str * default hf org from token * hf token value error * Add logs after successful upload * logging updates * dataset card example in the readme --------- Co-authored-by:
Nathan Habib <nathan.habib@huggingface.com> Co-authored-by:
Alina Lozovskaia <alinailozovskaya@gmail.com>
-
- 07 May, 2024 2 commits
-
-
Yoav Katz authored
* Initial support for Unitxt datasets in LM Eval Harness See https://github.com/IBM/unitxt The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file. The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'. * Added dataset loading check to generate_yaml Improved error messages. * Speed up generate_yaml Added printouts and improved error message * Added output printout * Simplified integration of unitxt datasets Store all the common yaml configuration in a yaml include shared by all datasets of the same task. * Post code review comments - part 1 1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks 2. Added more datasets and tasks (NER, GEC) 3. Added README * Post code review comments - part 2 1. Added install unitxt install option in pyproject.toml: pip install 'lm_eval[unitxt]' 2. Added a check that unitxt is installed and print a clear error message if not * Commited missing pyproject change * Added documentation on adding datasets * More doc changes * add unitxt extra to readme * run precommit --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
KonradSzafer authored
-
- 05 May, 2024 1 commit
-
-
Muhammad Bin Usman authored
fix `----hf_hub_log_args` to `--hf_hub_log_args`
-
- 03 May, 2024 1 commit
-
-
KonradSzafer authored
* evaluation tracker implementation * OVModelForCausalLM test fix * typo fix * moved methods args * multiple args in one flag * loggers moved to dedicated dir * improved filename sanitization
-
- 25 Apr, 2024 1 commit
-
-
- 16 Apr, 2024 2 commits
-
-
Michael Goin authored
* Add neuralmagic models for SparseML and DeepSparse * Update to latest and add test * Format * Fix list to List * Format * Add deepsparse/sparseml to automated testing * Update pyproject.toml * Update pyproject.toml * Update README * Fixes for dtype and device * Format * Fix test * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Address review comments! --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
KonradSzafer authored
* added delta weights * removed debug * readme update * better error handling * autogptq warn * warn update * peft and delta error, explicitly deleting _model_delta * linter fix
-
- 08 Apr, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 05 Apr, 2024 1 commit
-
-
Seungwoo Ryu authored
* claude3 * supply for anthropic claude3 * supply for anthropic claude3 * anthropic config changes * add callback options on anthropic * line passed * claude3 tiny change * help anthropic installation * mention sysprompt / being careful with format in readme --------- Co-authored-by:haileyschoelkopf <hailey@eleuther.ai>
-
- 26 Mar, 2024 1 commit
-
-
Sergio Perez authored
* Integration of NeMo models into LM Evaluation Harness library * rename nemo model as nemo_lm * move nemo section in readme after hf section * use self.eot_token_id in get_until() * improve progress bar showing loglikelihood requests * data replication or tensor/pipeline replication working fine within one node * run pre-commit on modified files * check whether dependencies are installed * clarify usage of torchrun in README
-
- 25 Mar, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 15 Mar, 2024 1 commit
-
-
Eitan Turok authored
* Link to vllm integration * add pip install .[vllm] cmd
-
- 01 Mar, 2024 1 commit
-
-
Baber Abbasi authored
* make `WandbLogger` init args optional * nit * nit * nit * move import warning to `WandbLogger` * nit * update docs * nit
-
- 22 Feb, 2024 1 commit
-
-
Ayush Thakur authored
* add wandb as extra dependency * wandb metrics logging * refactor * log samples as tables * fix linter * refactor: put in a class * change dir * add panels * log eval as table * improve tables logging * improve reports logging * precommit run * ruff check * handle importing reports api gracefully * ruff * compare results * minor pre-commit fixes * build comparison report * ruff check * log results as artifacts * remove comparison script * update dependency * type annotate and docstring * add example * update readme * fix typo * teardown * handle outside wandb run * gracefully fail reports creation * precommit checks * add report url to summary * use wandb printer for better url stdout * fix ruff * handle N/A and groups * fix eval table * remove unused var * update wandb version req + disable reports stdout * remove reports feature to TODO * add label to multi-choice question data * log model predictions * lints * loglikelihood_rolling * log eval result for groups * log tables by group for better handling * precommit * choices column for multi-choice * graciously fail wandb * remove reports feature * track system metrics + total eval time + stdout --------- Co-authored-by:Lintang Sutawika <lintang@eleuther.ai>
-
- 06 Feb, 2024 2 commits
-
-
Michael Feil authored
* add hf_transfer * update dependencies * Delete stale `[linting]` extra * Update README.md with extras table --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Hailey Schoelkopf authored
-
- 05 Feb, 2024 1 commit
-
-
Michael Feil authored
* initial commit * remove overwrite bs * adding neuronx dependencies * Update README.md * update neuronx
-
- 01 Feb, 2024 1 commit
-
-
Hailey Schoelkopf authored
* Update CITATION.bib * Create CONTRIBUTING.md * add disclaimer re: multi node * flesh out some sections more * Flesh out contributor guide * revert CITATION.bib * appease pre-commit --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
- 31 Jan, 2024 1 commit
-
-
Baber Abbasi authored
* add bypass metric * fixed `bypass` metric. * add task attributes if predict_only * add `predict_only` checks * add docs * added `overide_metric`, `override_config` to `Task` * nits * nit * changed --predict_only to generations; nits * nits * nits * change gen_kwargs warning * add note about `--predict_only` in README.md * added `predict_only` * move table to bottom * nit * change null aggregation to bypass (conflict) * bugfix; default `temp=0.0` * typo
-
- 26 Jan, 2024 1 commit
-
-
NoushNabi authored
* added intel optimum * added intel optimum in readme * modified intel optimum * modified intel optimum * modified intel optimum * modified install optimum * modified path of IR file * added openvino_device * added openvino_device2 * changed optimum-causal to openvino-causal * Update README.md * Update README.md * remove `lm_eval.base` import * update openvino-causal -> openvino ; pass device through super().__init__() * Update README.md * Add optimum to tests dependencies * apply pre-commit * fix so tests pass --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 25 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
* Update README.md * [!Tip]
-
- 23 Jan, 2024 1 commit
-
-
Hailey Schoelkopf authored
* don't use get_task_dict() as a helper, it will download the dataset! * pre-commit * Update README.md --------- Co-authored-by:lintangsutawika <lintang@eleuther.ai>
-
- 22 Jan, 2024 2 commits
-
-
Brian Vaughan authored
-
Michael Goin authored
* Add `local-completions` support using OpenAI interface * Refactor oa_completion * Address tokenizer comments and change request chunks to batch size * Add warning message for tiktoken backend * fix formatting * fix whitespace * Update README.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 16 Jan, 2024 1 commit
-
-
Mark Saroufim authored
* Update README.md * punctuation --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 15 Jan, 2024 2 commits
-
-
Stella Biderman authored
It looks like Google Scholar has [already noticed](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C9&authuser=2&q=%22A+framework+for+few-shot+language+model+evaluation%2C+12+2023%22&btnG=) the updated citation block so let's add it back in.
-
Hailey Schoelkopf authored
* Make parallelize=True distinction clearer in documentation. * run linter
-
- 11 Jan, 2024 1 commit
-
-
Stella Biderman authored
-
- 08 Jan, 2024 1 commit
-
-
Stella Biderman authored
Over a dozen papers have used the updated citation block, but Google Scholar has noticed none of them. Since it does understand this citation, I think we should use it going forward until we have a way to ensure the newer citations are actually logged.
-
- 30 Dec, 2023 1 commit
-
-
Anjor Kanekar authored
-
- 23 Dec, 2023 1 commit
-
-
Hailey Schoelkopf authored
-
- 22 Dec, 2023 2 commits
-
-
Hailey Schoelkopf authored
* modularize HFLM code * pass through extra kwargs to AutoModel.from_pretrained call * remove explicit model_kwargs * rename gptq -> autogptq * fix tokenizer pad token errors * ensure model always respects device_map and autogptq's selected devices * add a _get_config helper fn * add mambaLMWrapper * add mamba extra * add mamba extra * fix conditional import * Fix botched merge commit * Remove beginning-of-file comment for consistency * Add docstring for mambaLM re: supported kwargs * Alphabetize extras * Update extras table * appease precommit * run precommit on mamba_lm
-
Bram Vanroy authored
-
- 21 Dec, 2023 3 commits
-
-
Anjor Kanekar authored
* Update README.md Add a not about running on apple arm gpus * Update README.md * Update README.md --------- Co-authored-by:Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
Alex Bäuerle authored
-
Anjor Kanekar authored
-
- 20 Dec, 2023 2 commits
-
-
Vicki Boykis authored
* LocalChatCompletionsLM add * clean up completions class * clean up completions class * update tokens * README * fix constructor * eos token * folding local-chat-completions into OpenAIChatCompletions * refactoring to include gen_kwargs as passable option * add todo on chat completion kwarg validation * Ruff and README fix * generalize to **kwargs * remove unnecessary kwargs * README and remove kwargs * README
-
Baber Abbasi authored
* add ruff and isort. remove black and flake8 * remove unnecessary dependencies * remove dependency from table * change order * ran ruff * check 3.9 * exclude evaluator * update CI workflow * use ruff config in pyproject.toml * test * add isort rules to ruff * sort imports * import `make_table` * try stages for no-commit-to-branch * turn on mypy for pre-commit * test * test * test * change no-commit-to-branch to default * nits * fixed dependency
-