Commits · 7d09b24c626edb1d4898ca73a5b53fe6adb3bc08 · gaoqiong / lm-evaluation-harness

03 Jul, 2024 1 commit
- pre-commit format · 96dfe976
  lintangsutawika authored Jul 03, 2024
  
  96dfe976
02 Jul, 2024 2 commits
- clean up diff · c1d9e625
  haileyschoelkopf authored Jul 02, 2024
  
  c1d9e625
- giving this a try · 3b7e6cc6
  haileyschoelkopf authored Jul 02, 2024
  
  3b7e6cc6
25 Jun, 2024 2 commits
- Remove `LM` dependency from `build_all_requests` (#2011) · 9b6179b2
  Baber Abbasi authored Jun 25, 2024
```
* refactored `lm.apply_chat_template`

* nit

* fix weird type error

* fixed!

* skip failing test

* pre-commit run all

* add type hints

* nit

* nit

* fixup
```
  9b6179b2
- add more error msgs, agg_metric -> agg_metric_list · 93c17c57
  haileyschoelkopf authored Jun 25, 2024
  
  93c17c57
24 Jun, 2024 2 commits

Hotfix breaking import (#2015) · 0ae3d3eb
Stella Biderman authored Jun 24, 2024

0ae3d3eb

add tokenizer logs info (#1731) · 536691da

achervyakov authored Jun 24, 2024



* add tokenizer logs info

* add no tokenizer case

* Update lm_eval/logging_utils.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* Update lm_eval/logging_utils.py
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* add updates

* fix conflict

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

536691da

19 Jun, 2024 1 commit

Log `fewshot_as_multiturn` in results files (#1995) · 78a54e14

Hailey Schoelkopf authored Jun 19, 2024



* log fewshot_as_multiturn in general tracker args

* Update evaluator.py

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

78a54e14

10 Jun, 2024 3 commits
- update format · 5b527a71
  lintangsutawika authored Jun 10, 2024
  
  5b527a71
- change how aggregate_metric is loaded · 80d0f412
  lintangsutawika authored Jun 10, 2024
  
  80d0f412
- removed " " in make_table · e184c501
  lintangsutawika authored Jun 10, 2024
  
  e184c501
07 Jun, 2024 1 commit
- fix indentation · 5032ebaf
  lintangsutawika authored Jun 07, 2024
  
  5032ebaf
04 Jun, 2024 1 commit
- format fix · 4eeb8715
  lintangsutawika authored Jun 04, 2024
  
  4eeb8715
03 Jun, 2024 2 commits

Add chat template (#1873) · 070d31df

KonradSzafer authored Jun 03, 2024



* initial chat template

* tokenizer attribute check

* variable rename

* interface update

* system instruction

* system inst default update

* fewshot as multiturn

* typing update

* indent update

* added comments

* Adding a fewshot in a more readable way

* linting

* Moved apply chat template to LM

* multiturn alternation fix

* cache key update

* apply chat template method fix

* add system prompt hash to cache_key

* tokenizer name property for cache_key

* property name fix

* linting backward compatibility fix

* docs and errors update

* add documentation on adding chat template compatibility to model_guide

* fewshot as multiturn check fix

* saving system inst and chat template in results

* eval tracker update

* docs update

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

070d31df

Fix fewshot seed only set when overriding num_fewshot (#1914) · 42d5c4bf
LSinev authored Jun 03, 2024
```
Fix #1906
```
42d5c4bf

30 May, 2024 1 commit

`higher_is_better` tickers in output table (#1893) · 14221c84

Zafir Stojanovski authored May 30, 2024



* Higher is better tickers in output table

* add extra check for `higher_is_better` not being None already

* Update lm_eval/evaluator.py

* fixup format I messed up

* add comment (and retrigger tests)

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>

14221c84

26 May, 2024 1 commit
- Rename `lm_eval.logging -> lm_eval.loggers` (#1858) · 0ff6ab99
  Hailey Schoelkopf authored May 26, 2024
```
* rename lm_eval.logging module

* fix evaluation tracker args
```
  0ff6ab99
24 May, 2024 1 commit
- Fix for bootstrap_iters = 0 case (#1715) (#1789) · b043b050
  Hailey Schoelkopf authored May 24, 2024
```
* add handling for bootstrap_iters=0 case

* add more detail to docstring

* run precommit
```
  b043b050
16 May, 2024 1 commit
- add get_subtask_list function to get proper subtask list · 88fea8ad
  lintangsutawika authored May 16, 2024
  
  88fea8ad
10 May, 2024 2 commits
- use task_id · 39c40277
  lintangsutawika authored May 10, 2024
  
  39c40277
- reformat with pre-commit · 13203943
  lintangsutawika authored May 10, 2024
  
  13203943
08 May, 2024 2 commits
- fixed conditional statement · a88886a3
  lintangsutawika authored May 08, 2024
  
  a88886a3
- update additional condition when loading a group in a group yaml · 44d70398
  lintangsutawika authored May 08, 2024
  
  44d70398
07 May, 2024 8 commits
- add version for groups · 2a817535
  lintangsutawika authored May 07, 2024
  
  2a817535
- sort set to False by default, fix predict_only arg · 9f698c20
  lintangsutawika authored May 07, 2024
  
  9f698c20
- move prepare_print_tasks to evaluator_utils · 110e5a28
  lintangsutawika authored May 07, 2024
  
  110e5a28
- readd group_agg · 03982e03
  lintangsutawika authored May 07, 2024
  
  03982e03
- update to work with new group and task configuration · ad70d206
  lintangsutawika authored May 07, 2024
  
  ad70d206
- Fix Caching Tests ; Remove `pretrained=gpt2` default (#1775) · 7fe2b93c
  Hailey Schoelkopf authored May 07, 2024
  
  7fe2b93c
- update mmlu · c23c9305
  lintangsutawika authored May 07, 2024
  
  c23c9305
- adjust group scoring with using ConfigurableGroup · 62572f05
  lintangsutawika authored May 07, 2024
  
  62572f05
06 May, 2024 1 commit

Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc

LSinev authored May 06, 2024

* Added fewshot sampling seeds to evaluator.simple_evaluate signature

Way to control seed of fewshot sampling
may help with #1591

* Added ability for custom sampler for ConfigurableTask

May be set in config like
```
fewshot_config:
  sampler: !function utils.MyFewshotSampler
```

* explicitly set fewshot random generator seed for HFLM generate_until_task test

* add backward compatibility for three args seed setup

* save seeds info to logs/reports

ae72cebc

05 May, 2024 1 commit
- limit fix (#1785) · cee785e0
  KonradSzafer authored May 05, 2024
  
  cee785e0
03 May, 2024 1 commit

evaluation tracker implementation (#1766) · 59cf408a

KonradSzafer authored May 03, 2024

* evaluation tracker implementation

* OVModelForCausalLM test fix

* typo fix

* moved methods args

* multiple args in one flag

* loggers moved to dedicated dir

* improved filename sanitization

59cf408a

25 Apr, 2024 1 commit
- fixed args input in aggregate_subtask_metrics · 9551bbf2
  lintangsutawika authored Apr 25, 2024
  
  9551bbf2
24 Apr, 2024 1 commit
- fixed size configuration · 5a98162d
  lintangsutawika authored Apr 24, 2024
  
  5a98162d
23 Apr, 2024 1 commit
- add a group config that allows disabling table for group score and group aggregate in general · 6da6d187
  lintangsutawika authored Apr 23, 2024
  
  6da6d187
22 Mar, 2024 1 commit
- add logging of model args (#1619) · cffc1bd3
  Baber Abbasi authored Mar 22, 2024
```
* add logging of model args

* nit

* Add warnings.

* nit

* add warning

* nit
```
  cffc1bd3
18 Mar, 2024 1 commit

Cleanup for v0.4.2 release (#1573) · 5627e819

Hailey Schoelkopf authored Mar 18, 2024

* Update interface.md

* fix: make caching reqs always work with accelerate launch

* remove stale task migration checklist

* remove deprecation warnings

* make informative TypeErrors for get_task_dict

* bump version metadata

* fix num_fewshot printing bug

* add fewshot value to cache key

5627e819

17 Mar, 2024 1 commit
- Add start date in results.json (#1592) · 6fae67a6
  kwrobel.eth authored Mar 17, 2024
  
  6fae67a6