------------------------------------ ALBERT模型汇总 ------------------------------------ 下表汇总介绍了目前PaddleNLP支持的ALBERT模型对应预训练权重。 关于模型的具体细节可以参考对应链接。 +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ | Pretrained Weight | Language | Details of the model | +==================================================================================+==============+==================================================================================+ |``albert-base-v1`` | English | 12 repeating layers, 128 embedding, | | | | 768-hidden, 12-heads, 11M parameters. | | | | ALBERT base model | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-large-v1`` | English | 24 repeating layers, 128 embedding, | | | | 1024-hidden, 16-heads, 17M parameters. | | | | ALBERT large model | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-xlarge-v1`` | English | 24 repeating layers, 128 embedding, | | | | 2048-hidden, 16-heads, 58M parameters. | | | | ALBERT xlarge model | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-xxlarge-v1`` | English | 12 repeating layers, 128 embedding, | | | | 4096-hidden, 64-heads, 223M parameters. | | | | ALBERT xxlarge model | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-base-v2`` | English | 12 repeating layers, 128 embedding, | | | | 768-hidden, 12-heads, 11M parameters. | | | | ALBERT base model (version2) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-large-v2`` | English | 24 repeating layers, 128 embedding, | | | | 1024-hidden, 16-heads, 17M parameters. | | | | ALBERT large model (version2) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-xlarge-v2`` | English | 24 repeating layers, 128 embedding, | | | | 2048-hidden, 16-heads, 58M parameters. | | | | ALBERT xlarge model (version2) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-xxlarge-v2`` | English | 12 repeating layers, 128 embedding, | | | | 4096-hidden, 64-heads, 223M parameters. | | | | ALBERT xxlarge model (version2) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-chinese-tiny`` | Chinese | 4 repeating layers, 128 embedding, | | | | 312-hidden, 12-heads, 4M parameters. | | | | ALBERT tiny model (Chinese) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-chinese-small`` | Chinese | 6 repeating layers, 128 embedding, | | | | 384-hidden, 12-heads, _M parameters. | | | | ALBERT small model (Chinese) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-chinese-base`` | Chinese | 12 repeating layers, 128 embedding, | | | | 768-hidden, 12-heads, 12M parameters. | | | | ALBERT base model (Chinese) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-chinese-large`` | Chinese | 24 repeating layers, 128 embedding, | | | | 1024-hidden, 16-heads, 18M parameters. | | | | ALBERT large model (Chinese) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-chinese-xlarge`` | Chinese | 24 repeating layers, 128 embedding, | | | | 2048-hidden, 16-heads, 60M parameters. | | | | ALBERT xlarge model (Chinese) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+ |``albert-chinese-xxlarge`` | Chinese | 12 repeating layers, 128 embedding, | | | | 4096-hidden, 16-heads, 235M parameters. | | | | ALBERT xxlarge model (Chinese) | +----------------------------------------------------------------------------------+--------------+----------------------------------------------------------------------------------+