------------------------------------ TinyBert模型汇总 ------------------------------------ 下表汇总介绍了目前PaddleNLP支持的TinyBert模型以及对应预训练权重。 关于模型的具体细节可以参考对应链接。 +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ | Pretrained Weight | Language | Details of the model | +==================================================================================+==============+==================================================================================+ |``tinybert-4l-312d`` | English | 4-layer, 312-hidden, | | | | 12-heads, 14.5M parameters. | | | | The TinyBert model distilled from | | | | the BERT model ``bert-base-uncased`` | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``tinybert-6l-768d`` | English | 6-layer, 768-hidden, | | | | 12-heads, 67M parameters. | | | | The TinyBert model distilled from | | | | the BERT model ``bert-base-uncased`` | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``tinybert-4l-312d-v2`` | English | 4-layer, 312-hidden, | | | | 12-heads, 14.5M parameters. | | | | The TinyBert model distilled from | | | | the BERT model ``bert-base-uncased`` | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``tinybert-6l-768d-v2`` | English | 6-layer, 768-hidden, | | | | 12-heads, 67M parameters. | | | | The TinyBert model distilled from | | | | the BERT model ``bert-base-uncased`` | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``tinybert-4l-312d-zh`` | Chinese | 4-layer, 312-hidden, | | | | 12-heads, 14.5M parameters. | | | | The TinyBert model distilled from | | | | the BERT model ``bert-base-uncased`` | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``tinybert-6l-768d-zh`` | Chinese | 6-layer, 768-hidden, | | | | 12-heads, 67M parameters. | | | | The TinyBert model distilled from | | | | the BERT model ``bert-base-uncased`` | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+