# XNLI ### Paper Title: `XNLI: Evaluating Cross-lingual Sentence Representations` Abstract: https://arxiv.org/abs/1809.05053 Based on the implementation of @yongzx (see https://github.com/EleutherAI/lm-evaluation-harness/pull/258) Prompt format (same as XGLM and mGPT): sentence1 + ", right? " + mask = (Yes|Also|No) + ", " + sentence2 Predicition is the full sequence with the highest likelihood. Language specific prompts are translated word-by-word with Google Translate and may differ from the ones used by mGPT and XGLM (they do not provide their prompts). Homepage: https://github.com/facebookresearch/XNLI ### Citation """ @InProceedings{conneau2018xnli, author = "Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel R. and Schwenk, Holger and Stoyanov, Veselin", title = "XNLI: Evaluating Cross-lingual Sentence Representations", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", year = "2018", publisher = "Association for Computational Linguistics", location = "Brussels, Belgium", } """ ### Groups and Tasks #### Groups * `xnli` #### Tasks * `xnli_ar`: Arabic * `xnli_bg`: Bulgarian * `xnli_de`: German * `xnli_el`: Greek * `xnli_en`: English * `xnli_es`: Spanish * `xnli_fr`: French * `xnli_hi`: Hindi * `xnli_ru`: Russian * `xnli_sw`: Swahili * `xnli_th`: Thai * `xnli_tr`: Turkish * `xnli_ur`: Urdu * `xnli_vi`: Vietnamese * `xnli_zh`: Chinese ### Checklist For adding novel benchmarks/datasets to the library: * [ ] Is the task an existing benchmark in the literature? * [ ] Have you referenced the original paper that introduced the task? * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test? If other tasks on this dataset are already supported: * [ ] Is the "Main" variant of this task clearly denoted? * [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates? * [ ] Have you noted which, if any, published evaluation setups are matched by this variant?