Unverified Commit d5ddccd9 authored by Ismail Hossain's avatar Ismail Hossain Committed by GitHub
Browse files

Add support for Titulm Bangla MMLU dataset (#3317)



* Added YAML task for [task/bangla]

* brief description of the task

* Update README.md

fix

---------
Co-authored-by: default avatarBaber Abbasi <92168766+baberabb@users.noreply.github.com>
parent 690ef8ba
......@@ -26,6 +26,7 @@ provided to the individual README.md files for each subfolder.
| [asdiv](asdiv/README.md) | Tasks involving arithmetic and mathematical reasoning challenges. | English |
| [babi](babi/README.md) | Tasks designed as question and answering challenges based on simulated stories. | English |
| [babilong](babilong/README.md) | Tasks designed to test whether models can find and reason over facts in long contexts. | English |
| [bangla_mmlu](bangla/README.md) | Benchmark dataset for evaluating language models' performance on Bangla (Bengali) language tasks.Includes diverse NLP tasks to measure model understanding and generation capabilities in Bangla. | Bengali/Bangla |
| [basque_bench](basque_bench/README.md) | Collection of tasks in Basque encompassing various evaluation areas. | Basque |
| [basqueglue](basqueglue/README.md) | Tasks designed to evaluate language understanding in Basque language. | Basque |
| [bbh](bbh/README.md) | Tasks focused on deep semantic understanding through hypothesization and reasoning. | English, German |
......
# Titulm Bangla MMLU
This repository contains resources related to **Titulm Bangla MMLU**, a benchmark dataset designed for evaluating Bangla language models. The dataset is used for training, development, and comparative evaluation of language models in the Bangla language.
---
## Overview
**TituLLMs** is a family of Bangla large language models (LLMs) with comprehensive benchmarking designed to advance natural language processing for the Bangla language. The benchmark dataset `Titulm Bangla MMLU` covers multiple-choice questions across a diverse range of topics in Bangla.
This dataset is primarily used to train, validate, and evaluate Bangla language models and compare their performance with other existing models.
For more details, please refer to the original research paper:
[https://arxiv.org/abs/2502.11187](https://arxiv.org/abs/2502.11187)
---
## Dataset
The `Titulm Bangla MMLU` dataset can be found on Hugging Face:
[https://huggingface.co/datasets/hishab/titulm-bangla-mmlu](https://huggingface.co/datasets/hishab/titulm-bangla-mmlu)
This dataset was used as a benchmark in the development and evaluation of TituLLMs and related models.
---
## Usage
The dataset is intended for use within the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) repository to evaluate and compare the performance of Bangla language models.
---
## Note: The dataset can also be used to evaluate other models
### Other datasets like boolq, openbookqa ... soon to be added
## Citation
If you use this dataset or model, please cite the original paper:
```bibtex
@misc{nahin2025titullmsfamilybanglallms,
title={TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking},
author={Shahriar Kabir Nahin and Rabindra Nath Nandi and Sagor Sarker and Quazi Sarwar Muhtaseem and Md Kowsher and Apu Chandraw Shill and Md Ibrahim and Mehadi Hasan Menon and Tareq Al Muntasir and Firoj Alam},
year={2025},
eprint={2502.11187},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.11187},
}
task: bangla_mmlu
dataset_path: hishab/titulm-bangla-mmlu
dataset_name: all
description: "The following are multiple choice questions (with answers) about range of topics in Bangla"
test_split: test
fewshot_split: dev
fewshot_config:
sampler: first_n
output_type: multiple_choice
doc_to_text: "{{question.strip()}} A. {{options[0]}} B. {{options[1]}} C. {{options[2]}} D. {{options[3]}} Answer:"
doc_to_choice: ["A", "B", "C", "D"]
doc_to_target: answer
metric_list:
- metric: acc
aggregation: mean
higher_is_better: true
- metric: acc_norm
aggregation: mean
higher_is_better: true
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment