Support pipeline parallel with OpenVINO models (#2349)

* Handle pipeline_parallel parameter * Add description of pipeline parallelism with OV models

Support pipeline parallel with OpenVINO models (#2349)
* Handle pipeline_parallel parameter * Add description of pipeline parallelism with OV models
1f9bc88f · Slawomir Strehlke · GitHub · 6824d39d · 1f9bc88f · 1f9bc88f
Unverified Commit 1f9bc88f authored Dec 04, 2024 by Slawomir Strehlke Committed by GitHub Dec 04, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 16 additions and 0 deletions

README.md README.md +13 -0

lm_eval/models/optimum_lm.py lm_eval/models/optimum_lm.py +3 -0

No files found.
--- a/README.md
+++ b/README.md
@@ -205,6 +205,19 @@ Note that it is recommended to substitute the `python` command by `torchrun --np

 Not supported yet: multi-node evaluation and combinations of data replication with tensor or pipeline parallelism.

+#### Multi-GPU evaluation with OpenVINO models
+
+Pipeline parallelizm during evaluation is supported with OpenVINO models
+
+To enable  pipeline parallelism, set the `model_args` of `pipeline_parallel`. In addition, you also have to set up `device` to value `HETERO:<GPU index1>,<GPU index2>` for example `HETERO:GPU.1,GPU.0` For example, the command to use pipeline paralelism of 2 is:
+
+```
+lm_eval --model openvino \
+    --tasks wikitext \
+    --model_args pretrained=<path_to_ov_model>,pipeline_parallel=True \
+    --device HETERO:GPU.1,GPU.0
+```
+
 ### Tensor + Data Parallel and Optimized Inference with `vLLM`

 We also support vLLM for faster inference on [supported model types](https://docs.vllm.ai/en/latest/models/supported_models.html), especially faster when splitting a model across multiple GPUs. For single-GPU or multi-GPU — tensor parallel, data parallel, or a combination of both — inference, for example:

--- a/lm_eval/models/optimum_lm.py
+++ b/lm_eval/models/optimum_lm.py
@@ -71,6 +71,9 @@ class OptimumLM(HFLM):
        else:
            model_kwargs["ov_config"] = {}
        model_kwargs["ov_config"].setdefault("CACHE_DIR", "")
+        if 'pipeline_parallel' in model_kwargs:
+            if model_kwargs['pipeline_parallel']:
+                model_kwargs["ov_config"]["MODEL_DISTRIBUTION_POLICY"] = "PIPELINE_PARALLEL"
        model_file = Path(pretrained) / "openvino_model.xml"
        if model_file.exists():
            export = False