Support for Inf2 optimum class [WIP] (#1364)

* initial commit * remove overwrite bs * adding neuronx dependencies * Update README.md * update neuronx

Support for Inf2 optimum class [WIP] (#1364)
* initial commit * remove overwrite bs * adding neuronx dependencies * Update README.md * update neuronx
d17dcea0 · Michael Feil · GitHub · 74119471 · d17dcea0 · d17dcea0
Unverified Commit d17dcea0 authored Feb 05, 2024 by Michael Feil Committed by GitHub Feb 05, 2024
5 changed files
--- a/README.md
+++ b/README.md
@@ -45,6 +45,7 @@ git clone https://github.com/EleutherAI/lm-evaluation-harness
 cd lm-evaluation-harness
 pip install -e .
 ```
 We also provide a number of optional dependencies for extended functionality. A detailed table is available at the end of this document.
 ## Basic Usage
@@ -174,6 +175,7 @@ Note that for externally hosted models, configs such as `--device` and `--batch_
 | vLLM                                                                                                                      | :heavy_check_mark:       | `vllm`                                                              | [Most HF Causal Language Models](https://docs.vllm.ai/en/latest/models/supported_models.html) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
 | Mamba                       | :heavy_check_mark:       | `mamba_ssm`                                                                      | [Mamba architecture Language Models via the `mamba_ssm` package](https://huggingface.co/state-spaces) | `generate_until`, `loglikelihood`, `loglikelihood_rolling`                             |
 | Huggingface Optimum (Causal LMs)    | ✔️         | `openvino`                                 |     Any decoder-only AutoModelForCausalLM converted with Huggingface Optimum into OpenVINO™ Intermediate Representation (IR) format                           |  `generate_until`, `loglikelihood`, `loglikelihood_rolling`                         | ...                                                      |
+| Neuron via AWS Inf2 (Causal LMs)    | ✔️         | `neuronx`                                 |     Any decoder-only AutoModelForCausalLM supported to run on [huggingface-ami image for inferentia2](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2)                         |  `generate_until`, `loglikelihood`, `loglikelihood_rolling`                         | ...                                                      |
 | Your local inference server!                                                                                              | :heavy_check_mark:                             | `local-completions` or `local-chat-completions` (using `openai-chat-completions` model type)    | Any server address that accepts GET requests using HF models and mirror's OpenAI's Completions or ChatCompletions interface                                  | `generate_until`                                           |                                | ...                |
 Models which do not supply logits or logprobs can be used with tasks of type `generate_until` only, while local models, or APIs that supply logprobs/logits of their prompts, can be run on all task types: `generate_until`, `loglikelihood`, `loglikelihood_rolling`, and `multiple_choice`.
@@ -313,6 +315,7 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"`
 | dev           | For linting PRs and contributions     |
 | gptq          | For loading models with GPTQ          |
 | ifeval        | For running the IFEval task           |
+| neuronx       | For running on AWS inf2 instances     |
 | mamba         | For loading Mamba SSM models          |
 | math          | For running math task answer checking |
 | multilingual  | For multilingual tokenizers           |

--- a/lm_eval/models/__init__.py
+++ b/lm_eval/models/__init__.py
@@ -7,5 +7,5 @@ from . import gguf
 from . import vllm_causallms
 from . import mamba_lm
 from . import optimum_lm
+from . import neuron_optimum
 # TODO: implement __all__
--- a/lm_eval/models/neuron_optimum.py
+++ b/lm_eval/models/neuron_optimum.py
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -58,6 +58,7 @@ anthropic = ["anthropic"]
 dev = ["pytest", "pytest-cov", "pytest-xdist", "pre-commit", "mypy"]
 gptq = ["auto-gptq[triton]>=0.6.0"]
 ifeval = ["langdetect", "immutabledict"]
+neuronx = ["optimum[neuronx]"]
 mamba = ["mamba_ssm", "causal-conv1d==1.0.2"]
 math = ["sympy>=1.12", "antlr4-python3-runtime==4.11"]
 multilingual = ["nagisa>=0.2.7", "jieba>=0.42.1", "pycountry"]

--- a/tests/models/test_neuron_optimum.py
+++ b/tests/models/test_neuron_optimum.py
+import pytest
+import torch
+from lm_eval.models.neuron_optimum import wrap_constant_batch_size
+def test_wrap_constant_batch_size():
+    class Tester:
+        def __init__(self, batch_size):
+            self.batch_size = batch_size
+        @wrap_constant_batch_size
+        def test_constant_batch_size(self, inputs):
+            assert len(inputs) == self.batch_size
+            return inputs
+    batch_size_test = 8
+    for i in range(1, batch_size_test + 1):
+        tensor = torch.ones([i, 2, 2])
+        out = Tester(batch_size=batch_size_test).test_constant_batch_size(tensor)
+        torch.testing.assert_allclose(out, tensor)
+    with pytest.raises(ValueError):
+        Tester(batch_size=batch_size_test).test_constant_batch_size(
+            torch.ones([batch_size_test + 1, 2, 2])
+        )