README.md

# MultiBLiMP: A Massively Multilingual Benchmark of Linguistic Minimal Pairs

## Task Description
MultiBLiMP is a massively multilingual benchmark of linguistic minimal pairs, covering 101 languages, 6 linguistic phenomena and containing more than 125,000 minimal pairs.

* Paper: https://arxiv.org/abs/2504.02768
* GitHub Repo: https://github.com/jumelet/multiblimp/
* Hugging Face Dataset Repo: https://huggingface.co/datasets/jumelet/multiblimp

## Implementation

* `multiblimp_{lang}` runs MultiBLiMP for a given language, where `{lang}` must be replaced by the language's ISO 639-3 code (e.g., `eng` for English, `abk` for Abkhazian, `wbp` for Warlpiri, etc.).
* `multiblimp` tag runs MultiBLiMP for all languages

Note: The original implementation is provided [here](https://github.com/jumelet/multiblimp), and the [dataset repository](https://huggingface.co/datasets/jumelet/multiblimp) also includes a link to a more flexible version of the implementation [here](https://github.com/catherinearnett/multiblimp). This implementation follows these as closely as possible, but the original implementations normalize length by number of tokens, which is not supported by the Language Model Evaluation Harness (see [[1](https://blog.eleuther.ai/multiple-choice-normalization/)], [[2](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md)], [[3](https://github.com/EleutherAI/lm-evaluation-harness/issues/1396)]). For this reason, the implementation provided here includes both the `acc` (accuracy based on comparing the unnormalized log-probability of the correct and incorrect versions of each sentence) and `acc_norm` (the same as `acc` but with sentence log-probability normalized by number of bytes) metrics.

## Dataset Details

This table (from the [Hugging Face Dataset Repo](https://huggingface.co/datasets/jumelet/multiblimp)) lists the languages covered in MultiBLiMP and the number of items for each language.

| ISO Code |      Language      |   n  |
|:--------:|:------------------:|:----:|
| abk      | Abkhazian          | 40   |
| aqz      | Akuntsu            | 14   |
| sqi      | Albanian           | 243  |
| amh      | Amharic            | 112  |
| grc      | Ancient Greek      | 3695 |
| hbo      | Ancient Hebrew     | 983  |
| apu      | Apurinã            | 28   |
| hye      | Armenian           | 1415 |
| eus      | Basque             | 273  |
| bel      | Belarusian         | 2570 |
| ben      | Bengali            | 21   |
| bho      | Bhojpuri           | 34   |
| bor      | Borôro             | 241  |
| bre      | Breton             | 260  |
| bul      | Bulgarian          | 2458 |
| bua      | Buriat             | 103  |
| cat      | Catalan            | 2284 |
| chu      | Church Slavonic    | 4166 |
| xcl      | Classical Armenian | 1623 |
| ces      | Czech              | 4256 |
| dan      | Danish             | 50   |
| nld      | Dutch              | 2331 |
| egy      | Egyptian (Ancient) | 22   |
| eng      | English            | 770  |
| myv      | Erzya              | 464  |
| est      | Estonian           | 2575 |
| fao      | Faroese            | 232  |
| fin      | Finnish            | 2570 |
| fra      | French             | 2548 |
| glg      | Galician           | 753  |
| kat      | Georgian           | 204  |
| deu      | German             | 2298 |
| aln      | Gheg Albanian      | 677  |
| got      | Gothic             | 1579 |
| guj      | Gujarati           | 7    |
| heb      | Hebrew             | 2330 |
| azz      | H-P Nahuatl        | 207  |
| hin      | Hindi              | 1447 |
| hit      | Hittite            | 50   |
| hun      | Hungarian          | 845  |
| isl      | Icelandic          | 2801 |
| gle      | Irish              | 28   |
| ita      | Italian            | 2999 |
| quc      | K'iche'            | 131  |
| xnr      | Kangri             | 86   |
| krl      | Karelian           | 260  |
| kxh      | Karo (Ethiopia)    | 120  |
| kaz      | Kazakh             | 173  |
| kir      | Kirghiz            | 185  |
| koi      | Komi-Permyak       | 43   |
| kpv      | Komi-Zyrian        | 320  |
| lat      | Latin              | 3149 |
| lav      | Latvian            | 3032 |
| lij      | Ligurian           | 254  |
| lit      | Lithuanian         | 1180 |
| olo      | Livvi              | 190  |
| nds      | Low German         | 1774 |
| mkd      | Macedonian         | 39   |
| mar      | Marathi            | 460  |
| frm      | Middle French      | 294  |
| ell      | Modern Greek       | 1096 |
| mdf      | Moksha             | 82   |
| yrl      | Nhengatu           | 720  |
| pcm      | Nigerian Pidgin    | 26   |
| kmr      | Northern Kurdish   | 544  |
| sme      | Northern Sami      | 2536 |
| fro      | Old French         | 1976 |
| orv      | Old Russian        | 4615 |
| ota      | Ottoman Turkish    | 99   |
| fas      | Persian            | 2553 |
| xpg      | Phrygian           | 50   |
| pol      | Polish             | 3272 |
| por      | Portuguese         | 3048 |
| ron      | Romanian           | 2056 |
| rus      | Russian            | 3832 |
| san      | Sanskrit           | 4442 |
| gla      | Scottish Gaelic    | 66   |
| hbs      | Serbo-Croatian     | 3286 |
| sms      | Skolt Sami         | 263  |
| slk      | Slovak             | 4145 |
| slv      | Slovenian          | 4483 |
| spa      | Spanish            | 2541 |
| arb      | Standard Arabic    | 1215 |
| swe      | Swedish            | 201  |
| tam      | Tamil              | 382  |
| ttc      | Tektiteko          | 69   |
| tpn      | Tupinambá          | 9    |
| tur      | Turkish            | 1742 |
| uig      | Uighur             | 758  |
| ukr      | Ukrainian          | 2744 |
| hsb      | Upper Sorbian      | 186  |
| urd      | Urdu               | 550  |
| urb      | Urubú-Kaapor       | 13   |
| uzb      | Uzbek              | 50   |
| vep      | Veps               | 187  |
| wbp      | Warlpiri           | 12   |
| cym      | Welsh              | 1120 |
| hyw      | Western Armenian   | 1153 |
| wol      | Wolof              | 705  |
| sah      | Yakut              | 144  |
| nhi      | Tenango Nahuatl    | 38   |


## Citation
```
@misc{jumelet2025multiblimp10massivelymultilingual,
      title={MultiBLiMP 1.0: A Massively Multilingual Benchmark of Linguistic Minimal Pairs},
      author={Jaap Jumelet and Leonie Weissweiler and Arianna Bisazza},
      year={2025},
      eprint={2504.02768},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.02768},
}
```

## New Task Checklist

- [x] Is the task an existing benchmark in the literature?
  - [x] Have you referenced the original paper that introduced the task?
  - [x] If yes, does the original paper provide a reference implementation? If so, have you checked against the reference implementation and documented how to run such a test?