Commits · f264f2e2c4aa47e763058160c0bc1e20e07532ef · gaoqiong / lm-evaluation-harness

22 Jul, 2025 3 commits
- type hints · f264f2e2
  Baber authored Jul 22, 2025
  
  f264f2e2
- make multiple_input explicit · 230352ce
  Baber authored Jul 22, 2025
  
  230352ce
- nit · 00048838
  Baber authored Jul 22, 2025
  
  00048838
21 Jul, 2025 18 commits
- feat: implement check_gold_index_error utility and refactor process_results... · 55be51ea
  Baber authored Jul 22, 2025
```
feat: implement check_gold_index_error utility and refactor process_results for improved error handling. remove generate_until multiple-choice
```
  55be51ea
- feat: add TaskConfig.from_template method and enhance TemplateConfig with abstract methods · 16030317
  Baber authored Jul 22, 2025
  
  16030317
- refactor: improve default behavior for metric aggregation and higher-better checks · 897fbb37
  Baber authored Jul 21, 2025
  
  897fbb37
- type hints · 5c3badbe
  Baber authored Jul 21, 2025
  
  5c3badbe
- type hints · 17223113
  Baber authored Jul 21, 2025
  
  17223113
- type hints · 24b7e2d6
  Baber authored Jul 21, 2025
  
  24b7e2d6
- nit · 9f345f33
  Baber authored Jul 21, 2025
  
  9f345f33
- nit · e3fee7ea
  Baber authored Jul 21, 2025
  
  e3fee7ea
- Merge branch 'rm_multiple_target' into metrics · 3e3a0d8f
  Baber authored Jul 21, 2025
```
# Conflicts:
#	lm_eval/api/filter.py
#	lm_eval/api/metrics.py
#	lm_eval/api/task.py
#	lm_eval/filters/extraction.py
```
  3e3a0d8f
- type hints; · 00a77ebd
  Baber authored Jul 21, 2025
  
  00a77ebd
- nq_open: move multi_target to `exact_match` · 08c54c63
  Baber authored Jul 21, 2025
  
  08c54c63
- move multi_target to `exact_match` · 8f924e1c
  Baber authored Jul 21, 2025
  
  8f924e1c
- refactor masakhapos · 66e62de7
  Baber authored Jul 21, 2025
  
  66e62de7
- fix ruff · 2b4cdd41
  Baber authored Jul 21, 2025
  
  2b4cdd41
- nit · 79788d35
  Baber authored Jul 21, 2025
  
  79788d35
- add type hints · 87a59d41
  Baber authored Jul 21, 2025
  
  87a59d41
- add type hints · d83f7eb0
  Baber authored Jul 21, 2025
  
  d83f7eb0
- add ruff rules · a617e184
  Baber authored Jul 21, 2025
  
  a617e184
19 Jul, 2025 5 commits
- [tests] Added missing fixture in test_unitxt_tasks.py (#3163) · 8c05cfe0
  Avelina Asada Hadji-Kyriacou authored Jul 19, 2025
```
* Added missing fixture in test_unitxt_tasks.py

* pacify pre-commit

---------
Co-authored-by: Baber Abbasi <92168766+baberabb@users.noreply.github.com>
```
  8c05cfe0
- multiblimp (#3162) · 4366fc82
  Baber Abbasi authored Jul 20, 2025
  
  4366fc82
- Add the MultiBLiMP benchmark (#3155) · de4ce482
  James A. Michaelov authored Jul 19, 2025
```
* add multiblimp

* run linter
```
  de4ce482
- Bugfix: update path for GLUE (#3159) · 8c025369
  Avelina Asada Hadji-Kyriacou authored Jul 19, 2025
```
* Update default.yaml
```
  8c025369
- type hints · 4facd5c8
  Baber authored Jul 19, 2025
  
  4facd5c8
18 Jul, 2025 6 commits
- refactor build_filter_ensemble to simplify filter creation · 2ae642d8
  Baber authored Jul 18, 2025
  
  2ae642d8
- remove prompt-source for now · b6f38ac8
  Baber authored Jul 18, 2025
  
  b6f38ac8
- Merge branch 'main' into metrics · bd028848
  Baber authored Jul 18, 2025
```
# Conflicts:
#	tests/test_tasks.py
```
  bd028848
- Custom request headers | trust_remote_code param fix (#3069) · 56def33d
  Ramiro R. C. authored Jul 18, 2025
```
* added headers and custom model name | fixed bug with trust_remote_code param

* linting

* removed custom model name | changed headers override

* add `header` to base TemplateAPI

* nit

---------
Co-authored-by: Baber <baber@hey.com>
```
  56def33d
- fix request hanging when request api (#3090) · e6ea0315
  mans authored Jul 18, 2025
```
* fix request hanging when request api

* pre commit

---------
Co-authored-by: qinyidao <qinyidao@moonshot.cn>
```
  e6ea0315
- Fix medical benchmarks import (#3151) · 489fbc21
  Idan Tene authored Jul 18, 2025
```
* Update utils.py
```
  489fbc21
16 Jul, 2025 2 commits

`bbh_cot_fewshot`: Removed repeated "Let''s think step by step." text from bbh cot prompts (#3140) · c2be7211

philipdoldo authored Jul 16, 2025



* Removed the 'Let''s think step by step.' text from the start of the target entry in each of the samples to prevent this phrase from being repeated twice in the few-shot prompts and to match the behavior from the original bbh repository. Worth noting that this applied to only 26 out of 27 subtasks, the only one it did not apply to is boolean_expressions.yaml. When it comes to boolean_expressions.yaml, in my opinion there is an error in that it doesn't say the 'Remember that (i) ...' text after the final 'A: Let's think step by step.' in the prompt. Models like EleutherAI/gpt-neo-125m seem to always begin answers with this string anyway (copying what was done in the few-shot prompts), but I think it really should've been part of the prompt, much like how 'A: Let's think step by step.' is included in the prompt for all of the cot tasks. However, the original bbh repo also has this issue, so I think it is fine to keep it this way for consistency, but just thought I'd point it out anyway.

* feat: remove extra space from answers; add changelog

---------
Co-authored-by: Baber <baber@hey.com>

c2be7211

truncate thinking tags in generations (#3145) · 51ede33c

Baber Abbasi authored Jul 17, 2025

* feat: add postprocessing for generated text to strip stop sequences and thinking tokens

* nit

* fix: trim leading whitespace after stripping thinking tokens from generation

* feat: add think_end_token to model_args

* nit

* nit

* nit

* add to readme

* nit

51ede33c

15 Jul, 2025 1 commit
- fix: vllm lora (#3132) · 3102a8e4
  MaYongQing authored Jul 15, 2025
  
  3102a8e4
14 Jul, 2025 3 commits
- Fix for hang due to mp.Pool in bootstrap_stderr (#3135) · cf631de0
  Ankit Gola authored Jul 14, 2025
  
  cf631de0
- Added mixed_precision_dtype arg (#3138) · 31895e5b
  Avelina Asada Hadji-Kyriacou authored Jul 14, 2025
  
  31895e5b
- Adding EgyMMLU and EgyHellaSwag (#3063) · 2ea6114e
  Atou Houdaifa authored Jul 14, 2025
```
* add egy mmlu hellaswag

* add egymmlu egyhellaswag to tasks readme

* fix egymmlu config generation

* fix _generate_configs formating
```
  2ea6114e
10 Jul, 2025 2 commits
- fix: remove warning (#3128) · fcddf195
  Baber Abbasi authored Jul 10, 2025
  
  fcddf195
- warning for "chat" pretrained; disable buggy evalita configs (#3127) · f3a0b554
  Baber Abbasi authored Jul 10, 2025
```
* check for chat for warning

* add test

* remove yaml extension from some evalita configs

* move unitxt to own test script

* fix CI test
```
  f3a0b554