# Mock transformers
This is an application of mock [transformers](https://github.com/huggingface/transformers), which can perform distributed inference in LiBai with model under the transformers.
**Supported Model**
- [BLOOM](#distributed-infer-bloom): tensor parallel
- [OPT](#distributed-infer-opt): tensor parallel
## Environment
Before running the scripts, make sure to install the library's dependencies:
### Install libai
libai installation, refer to [Installation instructions](https://libai.readthedocs.io/en/latest/tutorials/get_started/Installation.html)
```bash
# create conda env
conda create -n libai python=3.8 -y
conda activate libai
# install oneflow nightly, [PLATFORM] could be cu117 or cu102
python3 -m pip install --pre oneflow -f https://staging.oneflow.info/branch/master/[PLATFORM]
# install libai
git clone https://github.com/Oneflow-Inc/libai.git
cd libai
pip install pybind11
pip install -e .
```
- All available `[PLATFORM]`:
| cu117 |
>= 450.80.02 |
GTX 10xx, RTX 20xx, A100, RTX 30xx |
| cu102 |
>= 440.33 |
GTX 10xx, RTX 20xx |
| cpu |
N/A |
N/A |
### Install transformers
refer to [transformers installation](https://github.com/huggingface/transformers#installation)
```
python3 -m pip install "transformers>=4.26"
```
Notes
- You need to register a Hugging Face account token and login with `huggingface-cli login`
```bash
python3 -m pip install huggingface_hub
```
- If no command available in the PATH, it might be in the `$HOME/.local/bin`
```bash
~/.local/bin/huggingface-cli login
```
## Distributed Inference Through Mock Transformers
| Models |
Tensor Parallel |
Pipeline Parallel |
| BLOOM |
✔ |
- |
| GPT2 |
✔ |
- |
| OPT |
✔ |
- |
## Examples
for `tensor_parallel=2`, run command in `libai_root`
```
bash tools/infer.sh projects/mock_transformers/dist_infer_opt.py 2
```
modify the infer code `dist_infer_opt.py` according to your own needs:
```python
...
if __name__ == "__main__":
# set dist config
parallel_config = DictConfig(
dict(
data_parallel_size=1,
tensor_parallel_size=2, # modify it according to your own needs
pipeline_parallel_size=1, # set to 1, unsupport pipeline parallel now
pipeline_num_layers=None,
)
)
dist.setup_dist_util(parallel_config)
...
# initial and load model
model = AutoModelForCausalLM.from_pretrained("facebook/opt-125m") # change your model type 125m~66b
model._apply(dist.convert_to_distributed_default_setting)
# initial tokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-125m", use_fast=False) # change your model type 125m~66b
```