Unverified Commit a21cedf4 authored by Harry Mellor's avatar Harry Mellor Committed by GitHub
Browse files

Bump `lm-eval` version for Transformers v5 compatibility (#33994)


Signed-off-by: default avatarHarry Mellor <19981378+hmellor@users.noreply.github.com>
parent 3ef74cde
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
# We can use this script to compute baseline accuracy on chartqa for vllm. # We can use this script to compute baseline accuracy on chartqa for vllm.
# #
# Make sure you have lm-eval-harness installed: # Make sure you have lm-eval-harness installed:
# pip install "lm-eval[api]>=0.4.9.2" # pip install "lm-eval[api]>=0.4.11"
usage() { usage() {
echo`` echo``
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
# We can use this script to compute baseline accuracy on GSM for transformers. # We can use this script to compute baseline accuracy on GSM for transformers.
# #
# Make sure you have lm-eval-harness installed: # Make sure you have lm-eval-harness installed:
# pip install "lm-eval[api]>=0.4.9.2" # pip install "lm-eval[api]>=0.4.11"
usage() { usage() {
echo`` echo``
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# We use this for fp8, which HF does not support. # We use this for fp8, which HF does not support.
# #
# Make sure you have lm-eval-harness installed: # Make sure you have lm-eval-harness installed:
# pip install "lm-eval[api]>=0.4.9.2" # pip install "lm-eval[api]>=0.4.11"
usage() { usage() {
echo`` echo``
......
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
# We use this for fp8, which HF does not support. # We use this for fp8, which HF does not support.
# #
# Make sure you have lm-eval-harness installed: # Make sure you have lm-eval-harness installed:
# pip install "lm-eval[api]>=0.4.9.2" # pip install "lm-eval[api]>=0.4.11"
usage() { usage() {
echo`` echo``
......
...@@ -61,7 +61,7 @@ echo "Results will be stored in: $RESULTS_DIR" ...@@ -61,7 +61,7 @@ echo "Results will be stored in: $RESULTS_DIR"
echo "--- Installing Python dependencies ---" echo "--- Installing Python dependencies ---"
python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \ python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \
&& python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \ && python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \
&& python3 -m pip install --progress-bar off "lm-eval[api]>=0.4.9.2" \ && python3 -m pip install --progress-bar off "lm-eval[api]>=0.4.11" \
&& python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0 && python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0
echo "--- Python dependencies installed ---" echo "--- Python dependencies installed ---"
......
...@@ -61,7 +61,7 @@ echo "Results will be stored in: $RESULTS_DIR" ...@@ -61,7 +61,7 @@ echo "Results will be stored in: $RESULTS_DIR"
echo "--- Installing Python dependencies ---" echo "--- Installing Python dependencies ---"
python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \ python3 -m pip install --progress-bar off git+https://github.com/thuml/depyf.git \
&& python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \ && python3 -m pip install --progress-bar off pytest pytest-asyncio tpu-info \
&& python3 -m pip install --progress-bar off "lm-eval[api]>=0.4.9.2" \ && python3 -m pip install --progress-bar off "lm-eval[api]>=0.4.11" \
&& python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0 && python3 -m pip install --progress-bar off hf-transfer tblib==3.1.0
echo "--- Python dependencies installed ---" echo "--- Python dependencies installed ---"
......
...@@ -84,7 +84,7 @@ Since simple RTN does not require data for weight quantization and the activatio ...@@ -84,7 +84,7 @@ Since simple RTN does not require data for weight quantization and the activatio
Install `vllm` and `lm-evaluation-harness` for evaluation: Install `vllm` and `lm-evaluation-harness` for evaluation:
```bash ```bash
pip install vllm "lm-eval[api]>=0.4.9.2" pip install vllm "lm-eval[api]>=0.4.11"
``` ```
Load and run the model in `vllm`: Load and run the model in `vllm`:
......
...@@ -18,7 +18,7 @@ pip install llmcompressor ...@@ -18,7 +18,7 @@ pip install llmcompressor
Additionally, install `vllm` and `lm-evaluation-harness` for evaluation: Additionally, install `vllm` and `lm-evaluation-harness` for evaluation:
```bash ```bash
pip install vllm "lm-eval[api]>=0.4.9.2" pip install vllm "lm-eval[api]>=0.4.11"
``` ```
## Quantization Process ## Quantization Process
......
...@@ -23,7 +23,7 @@ pip install llmcompressor ...@@ -23,7 +23,7 @@ pip install llmcompressor
Additionally, install `vllm` and `lm-evaluation-harness` for evaluation: Additionally, install `vllm` and `lm-evaluation-harness` for evaluation:
```bash ```bash
pip install vllm "lm-eval[api]>=0.4.9.2" pip install vllm "lm-eval[api]>=0.4.11"
``` ```
## Quantization Process ## Quantization Process
......
...@@ -20,7 +20,7 @@ for more installation details. ...@@ -20,7 +20,7 @@ for more installation details.
Additionally, install `vllm` and `lm-evaluation-harness` for evaluation: Additionally, install `vllm` and `lm-evaluation-harness` for evaluation:
```bash ```bash
pip install vllm "lm-eval[api]>=0.4.9.2" pip install vllm "lm-eval[api]>=0.4.11"
``` ```
## Quantization Process ## Quantization Process
......
...@@ -27,7 +27,7 @@ mistral_common[image,audio] >= 1.9.1 # required for voxtral test ...@@ -27,7 +27,7 @@ mistral_common[image,audio] >= 1.9.1 # required for voxtral test
num2words # required for smolvlm test num2words # required for smolvlm test
opencv-python-headless >= 4.13.0 # required for video test opencv-python-headless >= 4.13.0 # required for video test
datamodel_code_generator # required for minicpm3 test datamodel_code_generator # required for minicpm3 test
lm-eval[api]>=0.4.9.2 # required for model evaluation test lm-eval[api]>=0.4.11 # required for model evaluation test
mteb>=1.38.11, <2 # required for mteb test mteb>=1.38.11, <2 # required for mteb test
transformers==4.57.5 transformers==4.57.5
tokenizers==0.22.0 tokenizers==0.22.0
......
...@@ -58,7 +58,7 @@ schemathesis==3.39.15 ...@@ -58,7 +58,7 @@ schemathesis==3.39.15
# OpenAI schema test # OpenAI schema test
# Evaluation and benchmarking # Evaluation and benchmarking
lm-eval[api]==0.4.9.2 lm-eval[api]==0.4.11
jiwer==4.0.0 jiwer==4.0.0
# Required for multiprocessed tests that use spawn method, Datasets and Evaluate Test # Required for multiprocessed tests that use spawn method, Datasets and Evaluate Test
......
...@@ -35,7 +35,7 @@ num2words # required for smolvlm test ...@@ -35,7 +35,7 @@ num2words # required for smolvlm test
open_clip_torch==2.32.0 # Required for nemotron_vl test, Nemotron Parse in test_common.py open_clip_torch==2.32.0 # Required for nemotron_vl test, Nemotron Parse in test_common.py
opencv-python-headless >= 4.13.0 # required for video test opencv-python-headless >= 4.13.0 # required for video test
datamodel_code_generator # required for minicpm3 test datamodel_code_generator # required for minicpm3 test
lm-eval[api]>=0.4.9.2 # required for model evaluation test lm-eval[api]>=0.4.11 # required for model evaluation test
mteb[bm25s]>=2, <3 # required for mteb test mteb[bm25s]>=2, <3 # required for mteb test
transformers==4.57.5 transformers==4.57.5
tokenizers==0.22.0 tokenizers==0.22.0
......
...@@ -5,9 +5,7 @@ absl-py==2.1.0 ...@@ -5,9 +5,7 @@ absl-py==2.1.0
# rouge-score # rouge-score
# tensorboard # tensorboard
accelerate==1.0.1 accelerate==1.0.1
# via # via peft
# lm-eval
# peft
aenum==3.1.16 aenum==3.1.16
# via lightly # via lightly
affine==2.4.0 affine==2.4.0
...@@ -138,7 +136,6 @@ colorama==0.4.6 ...@@ -138,7 +136,6 @@ colorama==0.4.6
# perceptron # perceptron
# sacrebleu # sacrebleu
# schemathesis # schemathesis
# tqdm-multiprocess
colorful==0.5.6 colorful==0.5.6
# via ray # via ray
colorlog==6.10.1 colorlog==6.10.1
...@@ -383,6 +380,7 @@ jinja2==3.1.6 ...@@ -383,6 +380,7 @@ jinja2==3.1.6
# via # via
# datamodel-code-generator # datamodel-code-generator
# genai-perf # genai-perf
# lm-eval
# torch # torch
jiwer==3.0.5 jiwer==3.0.5
# via -r requirements/test.in # via -r requirements/test.in
...@@ -448,7 +446,7 @@ lightning-utilities==0.14.3 ...@@ -448,7 +446,7 @@ lightning-utilities==0.14.3
# torchmetrics # torchmetrics
llvmlite==0.44.0 llvmlite==0.44.0
# via numba # via numba
lm-eval==0.4.9.2 lm-eval==0.4.11
# via -r requirements/test.in # via -r requirements/test.in
lxml==5.3.0 lxml==5.3.0
# via # via
...@@ -513,8 +511,6 @@ numba==0.61.2 ...@@ -513,8 +511,6 @@ numba==0.61.2
# via # via
# -r requirements/test.in # -r requirements/test.in
# librosa # librosa
numexpr==2.10.1
# via lm-eval
numpy==2.2.6 numpy==2.2.6
# via # via
# -r requirements/test.in # -r requirements/test.in
...@@ -540,11 +536,11 @@ numpy==2.2.6 ...@@ -540,11 +536,11 @@ numpy==2.2.6
# librosa # librosa
# lightly # lightly
# lightly-utils # lightly-utils
# lm-eval
# matplotlib # matplotlib
# mistral-common # mistral-common
# mteb # mteb
# numba # numba
# numexpr
# opencv-python-headless # opencv-python-headless
# optuna # optuna
# pandas # pandas
...@@ -707,9 +703,7 @@ pathvalidate==3.2.1 ...@@ -707,9 +703,7 @@ pathvalidate==3.2.1
patsy==1.0.1 patsy==1.0.1
# via statsmodels # via statsmodels
peft==0.16.0 peft==0.16.0
# via # via -r requirements/test.in
# -r requirements/test.in
# lm-eval
perceptron==0.1.4 perceptron==0.1.4
# via -r requirements/test.in # via -r requirements/test.in
perf-analyzer==0.1.0 perf-analyzer==0.1.0
...@@ -792,8 +786,6 @@ pyasn1==0.6.1 ...@@ -792,8 +786,6 @@ pyasn1==0.6.1
# rsa # rsa
pyasn1-modules==0.4.2 pyasn1-modules==0.4.2
# via google-auth # via google-auth
pybind11==2.13.6
# via lm-eval
pycocotools==2.0.8 pycocotools==2.0.8
# via terratorch # via terratorch
pycountry==24.6.1 pycountry==24.6.1
...@@ -1171,7 +1163,6 @@ torch==2.10.0+cu129 ...@@ -1171,7 +1163,6 @@ torch==2.10.0+cu129
# kornia # kornia
# lightly # lightly
# lightning # lightning
# lm-eval
# mteb # mteb
# open-clip-torch # open-clip-torch
# peft # peft
...@@ -1229,15 +1220,11 @@ tqdm==4.67.3 ...@@ -1229,15 +1220,11 @@ tqdm==4.67.3
# sentence-transformers # sentence-transformers
# tacoreader # tacoreader
# terratorch # terratorch
# tqdm-multiprocess
# transformers # transformers
tqdm-multiprocess==0.0.11
# via lm-eval
transformers==4.57.5 transformers==4.57.5
# via # via
# -r requirements/test.in # -r requirements/test.in
# genai-perf # genai-perf
# lm-eval
# peft # peft
# sentence-transformers # sentence-transformers
# transformers-stream-generator # transformers-stream-generator
...@@ -1272,6 +1259,7 @@ typing-extensions==4.15.0 ...@@ -1272,6 +1259,7 @@ typing-extensions==4.15.0
# librosa # librosa
# lightning # lightning
# lightning-utilities # lightning-utilities
# lm-eval
# mistral-common # mistral-common
# mteb # mteb
# opentelemetry-api # opentelemetry-api
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment