# Roberta Trained Model For Masked Language Model On French Corpus :robot:
This is a Masked Language Model trained with [Roberta](https://huggingface.co/transformers/model_doc/roberta.html) on a small French News Corpus(Leipzig corpora).
The model is built using Huggingface transformers.
The model can be found at :[French-Roberta](https://huggingface.co/abhilash1910/french-roberta)
## Specifications
The corpus for training is taken from Leipzig Corpora (French News) , and is trained on a small set of the corpus (300K).
## Model Specification
The model chosen for training is [Roberta](https://arxiv.org/abs/1907.11692) with the following specifications:
1. vocab_size=32000
2. max_position_embeddings=514
3. num_attention_heads=12
4. num_hidden_layers=6
5. type_vocab_size=1
This is trained by using RobertaConfig from transformers package.The total training parameters :68124416
The model is trained for 100 epochs with a gpu batch size of 64 units.
More details for building custom models can be found at the [HuggingFace Blog](https://huggingface.co/blog/how-to-train)
## Usage Specifications
For using this model, we have to first import AutoTokenizer and AutoModelWithLMHead Modules from transformers
After that we have to specify, the pre-trained model,which in this case is 'abhilash1910/french-roberta' for the tokenizers and the model.