Unverified Commit 1f9bc88f authored by Slawomir Strehlke's avatar Slawomir Strehlke Committed by GitHub
Browse files

Support pipeline parallel with OpenVINO models (#2349)

* Handle pipeline_parallel parameter

* Add description of pipeline parallelism with OV models
parent 6824d39d
......@@ -205,6 +205,19 @@ Note that it is recommended to substitute the `python` command by `torchrun --np
Not supported yet: multi-node evaluation and combinations of data replication with tensor or pipeline parallelism.
#### Multi-GPU evaluation with OpenVINO models
Pipeline parallelizm during evaluation is supported with OpenVINO models
To enable pipeline parallelism, set the `model_args` of `pipeline_parallel`. In addition, you also have to set up `device` to value `HETERO:<GPU index1>,<GPU index2>` for example `HETERO:GPU.1,GPU.0` For example, the command to use pipeline paralelism of 2 is:
```
lm_eval --model openvino \
--tasks wikitext \
--model_args pretrained=<path_to_ov_model>,pipeline_parallel=True \
--device HETERO:GPU.1,GPU.0
```
### Tensor + Data Parallel and Optimized Inference with `vLLM`
We also support vLLM for faster inference on [supported model types](https://docs.vllm.ai/en/latest/models/supported_models.html), especially faster when splitting a model across multiple GPUs. For single-GPU or multi-GPU — tensor parallel, data parallel, or a combination of both — inference, for example:
......
......@@ -71,6 +71,9 @@ class OptimumLM(HFLM):
else:
model_kwargs["ov_config"] = {}
model_kwargs["ov_config"].setdefault("CACHE_DIR", "")
if 'pipeline_parallel' in model_kwargs:
if model_kwargs['pipeline_parallel']:
model_kwargs["ov_config"]["MODEL_DISTRIBUTION_POLICY"] = "PIPELINE_PARALLEL"
model_file = Path(pretrained) / "openvino_model.xml"
if model_file.exists():
export = False
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment