Though we recommend only installing the extras you require, to install the package with all extras, run
```bash
```bash
pip install-e".[all]"
pip install-e".[all]"
```
```
...
@@ -132,14 +132,35 @@ To use `accelerate` with the `lm-eval` command, use
...
@@ -132,14 +132,35 @@ To use `accelerate` with the `lm-eval` command, use
accelerate launch --no_python lm-eval --model ...
accelerate launch --no_python lm-eval --model ...
```
```
### Commercial APIs
### Tensor Parallel + Optimized Inference with vLLM
Our library also supports the evaluation of models served via several commercial APIs, and hope to implement support for common performant local/self-hosted inference servers.
We also support vLLM for faster inference on [supported model types](https://docs.vllm.ai/en/latest/models/supported_models.html).
To run with vLLM, first install the vllm library, externally or via the lm_eval[vllm] extra:
```bash
pip install-e .[vllm]
```
Then, you can run the library as normal, for single-GPU or tensor-parallel inference, for example:
```bash
python -m lm_eval \
--model vllm \
--model_argspretrained={model_name},tensor_parallel_size={number of GPUs to use},dtype=auto,gpu_memory_utilization=0.8
--tasks lambada_openai
--batch_size auto
```
For a full list of supported vLLM configurations, please reference our vLLM integration and the vLLM documentation.
### Supported APIs and Inference Libraries
Our library also supports the evaluation of models served via several commercial APIs, and we hope to implement support for the most commonly used performant local/self-hosted inference servers.
A full accounting of the supported and planned libraries + APIs can be seen below:
A full accounting of the supported and planned libraries + APIs can be seen below:
| API or Inference Server | Implemented? | `--model <xxx>` name | Models supported: | Request Types: |
| API or Inference Server | Implemented? | `--model <xxx>` name | Models supported: | Request Types: |
| Your inference server here! | ... | ... | ... | ... | | ... |
| Your inference server here! | ... | ... | ... | ... | | ... |
It is on our roadmap to create task variants designed to enable models which do not serve logprobs/loglikelihoods to be compared with generation performance of open-source models.
It is on our roadmap to create task variants designed to enable models which do not serve logprobs/loglikelihoods to be compared with generation performance of open-source models.
Using this decorator results in the class being added to an accounting of the usable LM types maintained internally to the library at `lm_eval.api.registry.MODEL_REGISTRY`. See `lm_eval.api.registry` for more detail on what sorts of registries and decorators exist in the library!
Using this decorator results in the class being added to an accounting of the usable LM types maintained internally to the library at `lm_eval.api.registry.MODEL_REGISTRY`. See `lm_eval.api.registry` for more detail on what sorts of registries and decorators exist in the library!
**Tip: be sure to import your model in `lm_eval/models/__init__.py!`**
## Testing
## Testing
We also recommend that new model contributions be accompanied by short tests of their 3 core functionalities, at minimum. To see an example of such tests, look at https://github.com/EleutherAI/lm-evaluation-harness/blob/35bdecd379c0cefad6897e67db892f4a6026a128/tests/test_ggml.py .
We also recommend that new model contributions be accompanied by short tests of their 3 core functionalities, at minimum. To see an example of such tests, look at https://github.com/EleutherAI/lm-evaluation-harness/blob/35bdecd379c0cefad6897e67db892f4a6026a128/tests/test_ggml.py .