Update with task list.

1768f118 · Lintang Sutawika · GitHub · 2d54f2c9 · 1768f118
Unverified Commit 1768f118 authored Aug 14, 2023 by Lintang Sutawika Committed by GitHub Aug 14, 2023
Show whitespace changes
Inline Side-by-side

Showing with 61 additions and 0 deletions

lm_eval/tasks/xnli/README.md lm_eval/tasks/xnli/README.md +61 -0

No files found.
--- a/lm_eval/tasks/xnli/README.md
+++ b/lm_eval/tasks/xnli/README.md
+# XNLI
+### Paper
+Title: `XNLI: Evaluating Cross-lingual Sentence Representations`
+Abstract: https://arxiv.org/abs/1809.05053
+Based on the implementation of @yongzx (see https://github.com/EleutherAI/lm-evaluation-harness/pull/258)
+Prompt format (same as XGLM and mGPT):
+sentence1 + ", right? " + mask = (Yes|Also|No) + ", " + sentence2
+Predicition is the full sequence with the highest likelihood.
+Language specific prompts are translated word-by-word with Google Translate
+and may differ from the ones used by mGPT and XGLM (they do not provide their prompts).
+Homepage: https://github.com/facebookresearch/XNLI
+### Citation
 """
 @InProceedings{conneau2018xnli,
  author = "Conneau, Alexis
@@ -15,3 +39,40 @@
  location = "Brussels, Belgium",
 }
 """
+### Groups and Tasks
+#### Groups
+* `xnli`
+#### Tasks
+* `xnli_ar`: Arabic
+* `xnli_bg`: Bulgarian
+* `xnli_de`: German
+* `xnli_el`: Greek
+* `xnli_en`: English
+* `xnli_es`: Spanish
+* `xnli_fr`: French
+* `xnli_hi`: Hindi
+* `xnli_ru`: Russian
+* `xnli_sw`: Swahili
+* `xnli_th`: Thai
+* `xnli_tr`: Turkish
+* `xnli_ur`: Urdu
+* `xnli_vi`: Vietnamese
+* `xnli_zh`: Chinese
+### Checklist
+For adding novel benchmarks/datasets to the library:
+* [ ] Is the task an existing benchmark in the literature?
+  * [ ] Have you referenced the original paper that introduced the task?
+  * [ ] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?
+If other tasks on this dataset are already supported:
+* [ ] Is the "Main" variant of this task clearly denoted?
+* [ ] Have you provided a short sentence in a README on what each new variant adds / evaluates?
+* [ ] Have you noted which, if any, published evaluation setups are matched by this variant?