following the `README.md`, preprocess orginal corpus to h5py+numpy
following the `README.md`, preprocess original corpus to h5py+numpy
## 2. Pretrain
...
...
@@ -44,7 +44,7 @@ following the `README.md`, load the h5py generated by preprocess of step 1 to pr
## 3. Finetune
The checkpoint produced by this repo can replace `pytorch_model.bin` from [hfl/chinese-roberta-wwm-ext-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large/tree/main) directly. Then use transfomers from HuggingFace to finetune downstream application.
The checkpoint produced by this repo can replace `pytorch_model.bin` from [hfl/chinese-roberta-wwm-ext-large](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large/tree/main) directly. Then use transfomers from HuggingFace to finetune downstream application.
## Contributors
The repo is contributed by AI team from [Moore Threads](https://www.mthreads.com/). If you find any problems for pretraining, please file an issue or send an email to yehua.zhang@mthreads.com. At last, welcome any form of contribution!
...
...
@@ -55,4 +55,4 @@ The repo is contributed by AI team from [Moore Threads](https://www.mthreads.com