"docs/zh_cn/vscode:/vscode.git/clone" did not exist on "87768b7f2df7391010aa1e21bc9fd682f1ccb481"
Unverified Commit d17dcea0 authored by Michael Feil's avatar Michael Feil Committed by GitHub
Browse files

Support for Inf2 optimum class [WIP] (#1364)

* initial commit

* remove overwrite bs

* adding neuronx dependencies

* Update README.md

* update neuronx
parent 74119471
...@@ -45,6 +45,7 @@ git clone https://github.com/EleutherAI/lm-evaluation-harness ...@@ -45,6 +45,7 @@ git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness cd lm-evaluation-harness
pip install -e . pip install -e .
``` ```
We also provide a number of optional dependencies for extended functionality. A detailed table is available at the end of this document. We also provide a number of optional dependencies for extended functionality. A detailed table is available at the end of this document.
## Basic Usage ## Basic Usage
...@@ -174,6 +175,7 @@ Note that for externally hosted models, configs such as `--device` and `--batch_ ...@@ -174,6 +175,7 @@ Note that for externally hosted models, configs such as `--device` and `--batch_
| vLLM | :heavy_check_mark: | `vllm` | [Most HF Causal Language Models](https://docs.vllm.ai/en/latest/models/supported_models.html) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | | vLLM | :heavy_check_mark: | `vllm` | [Most HF Causal Language Models](https://docs.vllm.ai/en/latest/models/supported_models.html) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
| Mamba | :heavy_check_mark: | `mamba_ssm` | [Mamba architecture Language Models via the `mamba_ssm` package](https://huggingface.co/state-spaces) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | | Mamba | :heavy_check_mark: | `mamba_ssm` | [Mamba architecture Language Models via the `mamba_ssm` package](https://huggingface.co/state-spaces) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` |
| Huggingface Optimum (Causal LMs) | ✔️ | `openvino` | Any decoder-only AutoModelForCausalLM converted with Huggingface Optimum into OpenVINO™ Intermediate Representation (IR) format | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... | | Huggingface Optimum (Causal LMs) | ✔️ | `openvino` | Any decoder-only AutoModelForCausalLM converted with Huggingface Optimum into OpenVINO™ Intermediate Representation (IR) format | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
| Neuron via AWS Inf2 (Causal LMs) | ✔️ | `neuronx` | Any decoder-only AutoModelForCausalLM supported to run on [huggingface-ami image for inferentia2](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2) | `generate_until`, `loglikelihood`, `loglikelihood_rolling` | ... |
| Your local inference server! | :heavy_check_mark: | `local-completions` or `local-chat-completions` (using `openai-chat-completions` model type) | Any server address that accepts GET requests using HF models and mirror's OpenAI's Completions or ChatCompletions interface | `generate_until` | | ... | | Your local inference server! | :heavy_check_mark: | `local-completions` or `local-chat-completions` (using `openai-chat-completions` model type) | Any server address that accepts GET requests using HF models and mirror's OpenAI's Completions or ChatCompletions interface | `generate_until` | | ... |
Models which do not supply logits or logprobs can be used with tasks of type `generate_until` only, while local models, or APIs that supply logprobs/logits of their prompts, can be run on all task types: `generate_until`, `loglikelihood`, `loglikelihood_rolling`, and `multiple_choice`. Models which do not supply logits or logprobs can be used with tasks of type `generate_until` only, while local models, or APIs that supply logprobs/logits of their prompts, can be run on all task types: `generate_until`, `loglikelihood`, `loglikelihood_rolling`, and `multiple_choice`.
...@@ -313,6 +315,7 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"` ...@@ -313,6 +315,7 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"`
| dev | For linting PRs and contributions | | dev | For linting PRs and contributions |
| gptq | For loading models with GPTQ | | gptq | For loading models with GPTQ |
| ifeval | For running the IFEval task | | ifeval | For running the IFEval task |
| neuronx | For running on AWS inf2 instances |
| mamba | For loading Mamba SSM models | | mamba | For loading Mamba SSM models |
| math | For running math task answer checking | | math | For running math task answer checking |
| multilingual | For multilingual tokenizers | | multilingual | For multilingual tokenizers |
......
...@@ -7,5 +7,5 @@ from . import gguf ...@@ -7,5 +7,5 @@ from . import gguf
from . import vllm_causallms from . import vllm_causallms
from . import mamba_lm from . import mamba_lm
from . import optimum_lm from . import optimum_lm
from . import neuron_optimum
# TODO: implement __all__ # TODO: implement __all__
This diff is collapsed.
...@@ -58,6 +58,7 @@ anthropic = ["anthropic"] ...@@ -58,6 +58,7 @@ anthropic = ["anthropic"]
dev = ["pytest", "pytest-cov", "pytest-xdist", "pre-commit", "mypy"] dev = ["pytest", "pytest-cov", "pytest-xdist", "pre-commit", "mypy"]
gptq = ["auto-gptq[triton]>=0.6.0"] gptq = ["auto-gptq[triton]>=0.6.0"]
ifeval = ["langdetect", "immutabledict"] ifeval = ["langdetect", "immutabledict"]
neuronx = ["optimum[neuronx]"]
mamba = ["mamba_ssm", "causal-conv1d==1.0.2"] mamba = ["mamba_ssm", "causal-conv1d==1.0.2"]
math = ["sympy>=1.12", "antlr4-python3-runtime==4.11"] math = ["sympy>=1.12", "antlr4-python3-runtime==4.11"]
multilingual = ["nagisa>=0.2.7", "jieba>=0.42.1", "pycountry"] multilingual = ["nagisa>=0.2.7", "jieba>=0.42.1", "pycountry"]
......
import pytest
import torch
from lm_eval.models.neuron_optimum import wrap_constant_batch_size
def test_wrap_constant_batch_size():
class Tester:
def __init__(self, batch_size):
self.batch_size = batch_size
@wrap_constant_batch_size
def test_constant_batch_size(self, inputs):
assert len(inputs) == self.batch_size
return inputs
batch_size_test = 8
for i in range(1, batch_size_test + 1):
tensor = torch.ones([i, 2, 2])
out = Tester(batch_size=batch_size_test).test_constant_batch_size(tensor)
torch.testing.assert_allclose(out, tensor)
with pytest.raises(ValueError):
Tester(batch_size=batch_size_test).test_constant_batch_size(
torch.ones([batch_size_test + 1, 2, 2])
)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment