## Couplets in LiBai
Contributor{Yulin Zhuang: [https://github.com/ZylOo0](https://github.com/ZylOo0)}
This is the LiBai implementation of Couplets
## Supported parallel mode and task
Based on [libai.layers](https://libai.readthedocs.io/en/latest/modules/libai.layers.html), Couplets model is automatically configured with the following parallelism mode.
| Model |
Data Parallel |
Tensor Parallel |
Pipeline Parallel |
| Couplets training |
✔ |
✔ |
✔ |
| Couplets inference |
- |
✔ |
✔ |
## Setup env
Install in LiBai, refer to [LiBai install doc](https://libai.readthedocs.io/en/latest/tutorials/get_started/Installation.html)
## Prepare Datasets
Download datasets and unzip data:
```shell
wget http://oneflow-static.oss-cn-beijing.aliyuncs.com/libai/couplets/couplets.zip
unzip couplets.zip
```
you will get dataset like this:
```
couplets
├── test
│ ├── in.txt
│ └── out.txt
├── train
│ ├── in.txt
│ └── out.txt
└── vocab.txt
```
## Training
- Set dataset path in `configs/config.py`
```python
dataloader.train = LazyCall(build_nlp_train_loader)(
dataset=[
LazyCall(CoupletsDataset)(
path="data_test/couplets", # set to your data_path
is_train=True,
maxlen=64,
)
],
num_workers=4,
)
dataloader.test = [
LazyCall(build_nlp_test_loader)(
dataset=LazyCall(CoupletsDataset)(
path="data_test/couplets", # set to your data_path
is_train=False,
maxlen=64,
),
num_workers=4,
)
]
```
- set model cfg in `configs/config.py` according to your needs
```python
transformer_cfg = dict(
vocab_size=9027,
max_position_embeddings=64,
hidden_size=512, # modify it according to your needs
intermediate_size=512, # modify it according to your needs
hidden_layers=6, # modify it according to your needs
num_attention_heads=8, # modify it according to your needs
embedding_dropout_prob=0.1,
hidden_dropout_prob=0.1,
attention_dropout_prob=0.1,
initializer_range=0.02,
layernorm_epsilon=1e-5,
bias_gelu_fusion=False,
bias_dropout_fusion=False,
scale_mask_softmax_fusion=False,
apply_query_key_layer_scaling=True,
)
```
- set dist config in `configs/config.py` according to your needs, refer to [LiBai distribute doc](https://libai.readthedocs.io/en/latest/tutorials/basics/Distributed_Configuration.html) for more details
```python
dist=dict(
data_parallel_size=1, # modify it according to your needs
tensor_parallel_size=1, # modify it according to your needs
pipeline_parallel_size=1, # modify it according to your needs
pipeline_stage_id=None, # modify it according to your needs
pipeline_num_layers=model.cfg.hidden_layers * 2,
),
```
- Following [quick_run](https://libai.readthedocs.io/en/latest/tutorials/get_started/quick_run.html) in LiBai, run training command in LiBai **root** dir
```shell
# cd path to libai
bash tools/train.sh tools/train_net.py projects/Couplets/configs/config.py 4
```
- After finish training, you will get trained model in path `output/couplet`
```shell
output/couplet
├── config.yaml
├── last_checkpoint
├── log.txt
├── log.txt.rank1
├── log.txt.rank2
├── log.txt.rank3
├── metrics.json
├── model_0004999
├── model_0009999
├── model_0014999
├── model_0019999
├── model_0024999
└── model_final
```
## inference
- for inference in one gpu:
```
# modify path in projects/Couplets/infer.py
# config_file = "output/couplet/config.yaml"
# checkpoint_file = "output/couplet/model_final"
# vocab_file = "data_test/couplets/vocab.txt"
python projects/Couplets/infer.py
```
- for distributed inference:
set your data path and model in `projects/Couplets/distribute_infer.py`
```python
# line 46
self.cfg.vocab_file = "data_test/couplets/vocab.txt"
# line 97 ~106
pipeline = CoupletPipeline(
# you can also use path output/couplet/config.yaml to replace config.py
"projects/Couplets/configs/config.py",
data_parallel=1,
tensor_parallel=1, # modify it according to your needs
pipeline_parallel=4, # modify it according to your needs
pipeline_stage_id=None, # modify it according to your needs
pipeline_num_layers=12, # modify it according to your needs
model_path="output/couplet/model_final/model", # modify it according to your needs
mode="libai",
)
```
```
bash tools/infer.sh projects/Couplets/distribute_infer.py 4
```
## Results
```shell
上联:
天朗气清风和畅
下联:
水流花海月圆融
上联:
千秋月色君长看
下联:
一夜风流人在天
上联:
梦里不知身是客
下联:
此间何处是家乡
```