Multillingual RoBERTa like model trained on Wikipedia articles of Hindi, Sanskrit, Gujarati languages. The tokenizer was trained on combined text.
However, Hindi text was used to pre-train the model and then it was fine-tuned on Sanskrit and Gujarati Text combined hoping that pre-training with Hindi