`gpt2-345M-bahasa-cased` model was pretrained on ~0.9 Billion words. We trained on standard language structure only, and below is list of data we trained on,
Preprocessing steps can reproduce from here, [Malaya/pretrained-model/preprocess](https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/preprocess).
## Pretraining details
- This model was trained using GPT2's github [repository](https://github.com/openai/gpt-2) on a V3-8 TPU.
- All steps can reproduce from here, [Malaya/pretrained-model/gpt2](https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/gpt2).
## Load Pretrained Model
You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this: