- 06 Jul, 2025 2 commits
-
-
Baber Abbasi authored
* remove sparse-ml
-
Baber Abbasi authored
-
- 05 Jul, 2025 1 commit
-
-
Baber Abbasi authored
-
- 23 Jun, 2025 1 commit
-
-
Baber authored
-
- 19 Jun, 2025 1 commit
-
-
Baber Abbasi authored
-
- 19 May, 2025 1 commit
-
-
Harsha authored
* adding ACPBench_hard * adding Clingo * changing tarski to tarski[clingo] * denoting the main variants in each paper
-
- 13 May, 2025 1 commit
-
-
Kiersten Stokes authored
Signed-off-by:kiersten-stokes <kierstenstokes@gmail.com>
-
- 23 Apr, 2025 1 commit
-
-
artemorloff authored
-
- 16 Apr, 2025 1 commit
-
-
Baber Abbasi authored
* switch MMLU to cais/mmlu * switch back to tj-actions/changed-files * cache HF folder
-
- 03 Apr, 2025 1 commit
-
-
Lu Fang authored
Signed-off-by:Lu Fang <lufang@fb.com>
-
- 20 Mar, 2025 1 commit
-
-
Kiersten Stokes authored
* Add markdown linter to pre-commit hooks * Reformat existing markdown (excluding lm_eval/tasks/*.md)
-
- 19 Mar, 2025 1 commit
-
-
Kiersten Stokes authored
-
- 18 Mar, 2025 1 commit
-
-
Baber Abbasi authored
suport for longcontext (and other synthetic tasks) * add ruler * add longbench * pass `metadata` to TaskConfig
-
- 17 Mar, 2025 1 commit
-
-
Kiersten Stokes authored
* Add support for token-based auth for watsonx models * Fix lint * Move dotenv import to inner scope * Improve readability of _verify_credentials
-
- 14 Mar, 2025 1 commit
-
-
achervyakov authored
* Added audio-modality pipeline for qwen2-audio model * Beauty imports * fix apply_chat_template args * update default audio placeholders list * add demo task - common_voice subset * add audiolm_qwen libs to pyproject.toml * pre-commit beautify --------- Co-authored-by:Alexandra Rak <rakalexandra@mail.ru>
-
- 05 Mar, 2025 1 commit
-
-
Baber Abbasi authored
-
- 04 Mar, 2025 2 commits
-
-
Kiersten Stokes authored
* Add a test for a custom unitxt task * Update task.py to bring in line with breaking change in v1.17.2 * Fix lint
-
Lucia Quirke authored
* Enable steering HF models Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu> * increase HF download timeout * Update readme; improve steering vector device handling * Update latest news * remove HF timeout increase * fix tests * ignore sae lens test * fix accidental force push --------- Co-authored-by:
Matthew Khoriaty <matthewkhoriaty2026@u.northwestern.edu>
-
- 21 Feb, 2025 1 commit
-
-
Baber Abbasi authored
* add math_verify to minerva math * add math_verify to benchmark * fix error * increment version
-
- 17 Dec, 2024 2 commits
-
-
Baber Abbasi authored
* feat: drop Python 3.8 support * feat: drop Python 3.8 tests * pre-commit
-
Baber Abbasi authored
forgot to increment 0.4.6!
-
- 15 Nov, 2024 1 commit
-
-
Nikodem Szwast authored
* refactor code, fix config path bug * update types to be from typing lib * add pre-commit formatting * specify version of ibm_watsonx_ai package * adjust get_watsonx_credentials() function, add minor refactor to adress PR review comments * change missing installation hint from ibm_watsonx_ai to lm_eval[ibm_watsonx_ai]
-
- 05 Nov, 2024 1 commit
-
-
mtkachenko authored
* add jaqket_v2 and jcommonsenseqa * remove comments * remove num_beams as it is incompatible with vllm * add jnli + refactor * rename jnla -> jnli * add jsquad + replace colon chars with the Japanese unicode * ignore whitespaces in generation tasks * add marc_ja * add xwinograd + simplify other yamls * add mgsm and xlsum * refactor xlsum * add ja_leaderboard tag * edit README.md * update README.md * add credit + minor changes * run ruff format * address review comments + add group * remove aggregate_metric_list * remove tags * update tasks/README.md
-
- 31 Oct, 2024 1 commit
-
-
Qubitium-ModelCloud authored
* support gptqmodel * code opt * add gptqmodel option * Update huggingface.py * Update pyproject.toml * gptqmodel version upgraded to 1.0.6 * GPTQModel version upgraded to 1.0.8 * Update pyproject.toml * fix ruff-format error * add gptqmodel test * Update gptqmodel test model * skip cuda * python3.8 compatible * Update README.md * Update README.md --------- Co-authored-by:CL-ModelCloud <cl@modelcloud.ai>
-
- 25 Oct, 2024 1 commit
-
-
Kiersten Stokes authored
* Update pyproject.toml with watsonx package extra Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com> * Remove unused function Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com> --------- Signed-off-by:
kiersten-stokes <kierstenstokes@gmail.com>
-
- 23 Oct, 2024 1 commit
-
-
Nikodem Szwast authored
* add support for IBM watsonx_llm * add ibm_watsonx_ai package to optional-dependencies * move global scope imports to inner scope * change cache to lru_cache * fix circular import * use 3.8 typing * use 3.8 typing --------- Co-authored-by:Baber <baber@hey.com>
-
- 08 Oct, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 05 Sep, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 28 Aug, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 01 Aug, 2024 1 commit
-
-
Nathan Weinberg authored
* refactor: move scipy and sklearn module imports to func imports Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * refactor: consolidate weighted_f1_score func into lm_eval utils Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> * lint: allow for utils file to have unused imports this allows for shared functions to be defined only once while allowing for the YAML function importing to continue working Signed-off-by:
Nathan Weinberg <nweinber@redhat.com> --------- Signed-off-by:
Nathan Weinberg <nweinber@redhat.com>
-
- 22 Jul, 2024 1 commit
-
-
Baber Abbasi authored
* refactor pad_token handling to fn * fix docs * add pad_token_handling to vllm * start on API superclass * don't detokenize the returned logits * streamline vllm tokenizer * add type hint * pre-commit * seems to be in working order * add model to init * refactor api models * nit * cleanup * add pbar * fix type hints * change optional dependencies * json encode chat template * add type hints * deal with different prompt input requiremnts * nits * fix * cache inside async * fix * fix * nits * nits * nits * nit * fixup * fixup * nit * add dummy retry * add dummy retry * handle imports; skip failing test * add type hint * add tests * add dependency to tests * add package names to exception * nit * docs; type hints * handle api key * nit * tokenizer bug * fix tokenizer * nit * nit * add better error messages * nit * remove decorator * CI: install api dep * revert evaluator.py * consolidate * consolidate * nits * nit * fix typealias * nit * nit * nit * Update lm_eval/models/api_models.py typo Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/openai_completions.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/anthropic_llms.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/models/api_models.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * fix typo * add news section * add info for API * pre-commit * typo * fix bug: unpack logliklehood requests * fix bug: shared gen_kwargs mutated * nit: handle copy properly * Update README.md * Update README.md * Update README.md * Update api_models.py * Update README.md --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 08 Jul, 2024 1 commit
-
-
Elron Bandel authored
* Updated unitxt loading Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Revert change to general Readme Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Adjust fda,squadv2,squad_completion and swde to work accept config in the constructor Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> * Fix scrolls Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Update documentation Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Enforce backward compatability Signed-off-by:
elronbandel <elron.bandel@ibm.com> * Format unitxt class Signed-off-by:
elronbandel <elron.bandel@ibm.com> --------- Signed-off-by:
Elron Bandel <elron.bandel@ibm.com> Signed-off-by:
elronbandel <elron.bandel@ibm.com> Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 01 Jul, 2024 1 commit
-
-
Hailey Schoelkopf authored
-
- 30 May, 2024 1 commit
-
-
Huazhong Ji authored
* [HFLM]Add support for Ascend NPU Co-authored-by:
jiaqiw09 <jiaqiw960714@gmail.com> Co-authored-by:
zhabuye <2947436155@qq.com> * bump accelerate dependency version to 0.26.0 for NPU compat. --------- Co-authored-by:
jiaqiw09 <jiaqiw960714@gmail.com> Co-authored-by:
zhabuye <2947436155@qq.com> Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 23 May, 2024 1 commit
-
-
Edward Gan authored
-
- 07 May, 2024 1 commit
-
-
Yoav Katz authored
* Initial support for Unitxt datasets in LM Eval Harness See https://github.com/IBM/unitxt The script 'generate_yamls.py' creates LM Eval Harness yaml files corresponding to Unitxt datasets specified in the 'unitxt_datasets' file. The glue code required to register Unitxt metrics is in 'unitxt_wrapper.py'. * Added dataset loading check to generate_yaml Improved error messages. * Speed up generate_yaml Added printouts and improved error message * Added output printout * Simplified integration of unitxt datasets Store all the common yaml configuration in a yaml include shared by all datasets of the same task. * Post code review comments - part 1 1. Made sure include files don't end wth 'yaml' so they won't be marked as tasks 2. Added more datasets and tasks (NER, GEC) 3. Added README * Post code review comments - part 2 1. Added install unitxt install option in pyproject.toml: pip install 'lm_eval[unitxt]' 2. Added a check that unitxt is installed and print a clear error message if not * Commited missing pyproject change * Added documentation on adding datasets * More doc changes * add unitxt extra to readme * run precommit --------- Co-authored-by:
haileyschoelkopf <hailey@eleuther.ai>
-
- 16 Apr, 2024 1 commit
-
-
Michael Goin authored
* Add neuralmagic models for SparseML and DeepSparse * Update to latest and add test * Format * Fix list to List * Format * Add deepsparse/sparseml to automated testing * Update pyproject.toml * Update pyproject.toml * Update README * Fixes for dtype and device * Format * Fix test * Apply suggestions from code review Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Address review comments! --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 18 Mar, 2024 1 commit
-
-
Hailey Schoelkopf authored
* Update interface.md * fix: make caching reqs always work with accelerate launch * remove stale task migration checklist * remove deprecation warnings * make informative TypeErrors for get_task_dict * bump version metadata * fix num_fewshot printing bug * add fewshot value to cache key
-
- 06 Mar, 2024 1 commit
-
-
LSinev authored
* Remove unused `decontamination_ngrams_path` and all mentions (still no alternative path provided) * Fix improper import of LM and usage of evaluator in one of scripts * update type hints in instance and task api * raising errors in task.py instead of asserts * Fix warnings from ruff * raising errors in __main__.py instead of asserts * raising errors in tasks/__init__.py instead of asserts * raising errors in evaluator.py instead of asserts * evaluator: update type hints and remove unused variables in code * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/__main__.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/api/task.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * Update lm_eval/evaluator.py Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com> * pre-commit induced fixes --------- Co-authored-by:
Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
-
- 03 Mar, 2024 1 commit
-
-
Vicki Boykis authored
* setting trust_remote_code * dataset list no notebooks * respect trust remote code * Address changes, move cli options and change datasets * fix task for tests * headqa * remove kobest * pin datasets and address comments * clean up space
-