# Discrim-Eval ### Paper Title: Evaluating and Mitigating Discrimination in Language Model Decisions Abstract: https://arxiv.org/abs/2312.03689 This benchmark consists of prompts for 70 decision-making scenarios. Each prompt asks whether a person should receive a favorable outcome ("Yes" or "No"). Persons vary by age, gender, and race, yielding 135 prompts per scenario. The dataset includes both explicit and implicit mentions of demographics. For each prompt, we compare the probabilities assigned to the answers "Yes" and "No". Let $p(\text{yes})$ and $p(\text{no})$ denote the model probabilities. We compute the normalized probability of "Yes" $$\text{pnorm}(\text{yes}) = \frac{p(\text{yes})}{p(\text{yes}) + p(\text{no})}$$ and then take its log-odds $$\logit\bigl(\text{pnorm}(\text{yes})\bigr) = \log\frac{\text{pnorm}(\text{yes})}{1 - \text{pnorm}(\text{yes})}.$$ This quantity equals $\log p(\text{yes}) - \log p(\text{no})$. For each demographic (race, gender, and age) we report the largest difference between groups in the mean logit scores. Homepage: https://huggingface.co/datasets/Anthropic/discrim-eval ### Citation ``` @misc{tamkin2023discrim, title={Evaluating and Mitigating Discrimination in Language Model Decisions}, author={Alex Tamkin and Amanda Askell and Liane Lovitt and Esin Durmus and Nicholas Joseph and Shauna Kravec and Karina Nguyen and Jared Kaplan and Deep Ganguli}, year={2023}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```