Commits · 7d09b24c626edb1d4898ca73a5b53fe6adb3bc08 · gaoqiong / lm-evaluation-harness

03 Jul, 2024 2 commits
- move group api to separate file · 94673d40
  haileyschoelkopf authored Jul 03, 2024
  
  94673d40
- pre-commit format · 96dfe976
  lintangsutawika authored Jul 03, 2024
  
  96dfe976
02 Jul, 2024 2 commits
- clean up diff · c1d9e625
  haileyschoelkopf authored Jul 02, 2024
  
  c1d9e625
- fix merge conflicts? · d13b1f56
  haileyschoelkopf authored Jul 02, 2024
  
  d13b1f56
01 Jul, 2024 1 commit
- ship with exact_match function already used ; don't call evaluate.load() on import (#2045) · a8ac0446
  Hailey Schoelkopf authored Jul 01, 2024
  
  a8ac0446
28 Jun, 2024 1 commit
- fix cache (#2037) · e922cceb
  Baber Abbasi authored Jun 28, 2024
  
  e922cceb
27 Jun, 2024 1 commit
- update task_id to be updateable and uses group:task format · 43765669
  lintangsutawika authored Jun 27, 2024
  
  43765669
25 Jun, 2024 2 commits
- Remove `LM` dependency from `build_all_requests` (#2011) · 9b6179b2
  Baber Abbasi authored Jun 25, 2024
```
* refactored `lm.apply_chat_template`

* nit

* fix weird type error

* fixed!

* skip failing test

* pre-commit run all

* add type hints

* nit

* nit

* fixup
```
  9b6179b2
- add more error msgs, agg_metric -> agg_metric_list · 93c17c57
  haileyschoelkopf authored Jun 25, 2024
  
  93c17c57
13 Jun, 2024 1 commit

`samples` is newline delimited (#1930) · 3850e21a

Baber Abbasi authored Jun 13, 2024



* `samples` is newline delimited

* updated git and pre-commit

* appease pre-commit

* nit

* Revert back for now

* Revert for now

---------
Co-authored-by: Lintang Sutawika <lintang@eleuther.ai>

3850e21a

10 Jun, 2024 5 commits
- update format · 5b527a71
  lintangsutawika authored Jun 10, 2024
  
  5b527a71
- change how aggregate_metric is loaded · 9fa3b3f4
  lintangsutawika authored Jun 10, 2024
  
  9fa3b3f4
- change how aggregate_metric is loaded · 80d0f412
  lintangsutawika authored Jun 10, 2024
  
  80d0f412
- remove version for metadata · 1848d664
  lintangsutawika authored Jun 10, 2024
  
  1848d664
- remove group_alias · e8f49184
  lintangsutawika authored Jun 10, 2024
  
  e8f49184
07 Jun, 2024 1 commit
- formatting · e6b1581f
  lintangsutawika authored Jun 07, 2024
  
  e6b1581f
06 Jun, 2024 1 commit
- add documentation · be8b547b
  lintangsutawika authored Jun 06, 2024
  
  be8b547b
03 Jun, 2024 1 commit

Add chat template (#1873) · 070d31df

KonradSzafer authored Jun 03, 2024



* initial chat template

* tokenizer attribute check

* variable rename

* interface update

* system instruction

* system inst default update

* fewshot as multiturn

* typing update

* indent update

* added comments

* Adding a fewshot in a more readable way

* linting

* Moved apply chat template to LM

* multiturn alternation fix

* cache key update

* apply chat template method fix

* add system prompt hash to cache_key

* tokenizer name property for cache_key

* property name fix

* linting backward compatibility fix

* docs and errors update

* add documentation on adding chat template compatibility to model_guide

* fewshot as multiturn check fix

* saving system inst and chat template in results

* eval tracker update

* docs update

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

---------
Co-authored-by: haileyschoelkopf <hailey@eleuther.ai>
Co-authored-by: Clémentine Fourrier <22726840+clefourrier@users.noreply.github.com>
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

070d31df

31 May, 2024 1 commit

Making hardcoded few shots compatible with the chat template mechanism (#1895) · 4902aaaf

Clémentine Fourrier authored May 31, 2024



* init test 1

* fix

* this format seems to be working - need to update all other tasks with the new format

* bbh with few shot format

* fix fewshot bbh

* add mmlu flan cot

* samples of cot

* kmmlu

* fix gsm8k

* update keys for mmlu

* minerva math

* bbh

* fix

* fix samples

* small fixes to templates

* last prompt format change

* fixing prompt

* fixed minerva math format

* rm accidental commited file

* added doc for few shot samples

* Update lm_eval/loggers/evaluation_tracker.py

* Update lm_eval/loggers/evaluation_tracker.py

* Update docs/new_task_guide.md
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* added check in sampler per code review

* added the system from a function, plus an example in minerva math

* style

* Apply suggestions from code review
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

* fix unit tests 1

* forcing use of test split

---------
Co-authored-by: Hailey Schoelkopf <65563625+haileyschoelkopf@users.noreply.github.com>

4902aaaf

24 May, 2024 2 commits

Fix for bootstrap_iters = 0 case (#1715) (#1789) · b043b050
Hailey Schoelkopf authored May 24, 2024
```
* add handling for bootstrap_iters=0 case

* add more detail to docstring

* run precommit
```
b043b050

Fix Brier Score (#1847) · 7d747ea9

Lintang Sutawika authored May 25, 2024

`gold_one_hot` needs to follow the dimension of predictions so that it still works when `--limit` is used and the indexes in gold does not cover all gold indexes.

7d747ea9

16 May, 2024 1 commit
- group config to_dict update · e66c5f57
  lintangsutawika authored May 16, 2024
  
  e66c5f57
15 May, 2024 1 commit
- adjust task_id · c90655d5
  lintangsutawika authored May 15, 2024
  
  c90655d5
11 May, 2024 3 commits
- new group config parameter `tag_to_task` · 41e64b2e
  lintangsutawika authored May 11, 2024
  
  41e64b2e
- add task_id for python tasks as well · f4d2e6e0
  lintangsutawika authored May 11, 2024
  
  f4d2e6e0
- add task_id for python tasks as well · 1ef5b0bf
  lintangsutawika authored May 11, 2024
  
  1ef5b0bf
10 May, 2024 3 commits
- use task_id · 39c40277
  lintangsutawika authored May 10, 2024
  
  39c40277
- fixed info log · 7350958b
  lintangsutawika authored May 10, 2024
  
  7350958b
- remove warning · 1fae7283
  lintangsutawika authored May 10, 2024
  
  1fae7283
08 May, 2024 2 commits
- replace group to tag · 2f2322b9
  lintangsutawika authored May 08, 2024
  
  2f2322b9
- add description regarding tags replacing group · 3f770bb6
  lintangsutawika authored May 08, 2024
  
  3f770bb6
07 May, 2024 2 commits
- update to work with new group and task configuration · ad70d206
  lintangsutawika authored May 07, 2024
  
  ad70d206
- add configurable group · 86039e85
  lintangsutawika authored May 07, 2024
  
  86039e85
06 May, 2024 1 commit

Provide ability for custom sampler for ConfigurableTask (#1616) · ae72cebc

LSinev authored May 06, 2024

* Added fewshot sampling seeds to evaluator.simple_evaluate signature

Way to control seed of fewshot sampling
may help with #1591

* Added ability for custom sampler for ConfigurableTask

May be set in config like
```
fewshot_config:
  sampler: !function utils.MyFewshotSampler
```

* explicitly set fewshot random generator seed for HFLM generate_until_task test

* add backward compatibility for three args seed setup

* save seeds info to logs/reports

ae72cebc

26 Apr, 2024 1 commit
- Add filter registry decorator (#1750) · f64e72f5
  Nikita Lozhnikov authored Apr 26, 2024
```
* Add register_filter decorator

* Add register_filter docs
```
  f64e72f5
25 Apr, 2024 1 commit
- Fix Parameter Propagation for Tasks that have `include` (#1749) · 0bafcef0
  Lintang Sutawika authored Apr 26, 2024
```
* Update task.py

* Update __init__.py
```
  0bafcef0
24 Apr, 2024 1 commit
- add group config · eb9c6a57
  lintangsutawika authored Apr 24, 2024
  
  eb9c6a57
23 Apr, 2024 1 commit
- add greoup_config arg · 2a2566e6
  lintangsutawika authored Apr 23, 2024
  
  2a2566e6
25 Mar, 2024 1 commit

Seq2seq fix (#1604) · 262f879a

Lintang Sutawika authored Mar 25, 2024



* fix on --task list

* add fixes to tokeniation

* differentiate encoding for seq2seq and decoder

* return token setting

* format for pre-commit

* Seq2seq fix, pt2 (#1630)

* getting model class only when defined

* encode_pair handles None, add_special_tokens turned into dict with default value

---------
Co-authored-by: achervyakov <77295913+artemorloff@users.noreply.github.com>

262f879a

20 Mar, 2024 1 commit

Fixes to Loglikelihood prefix token / VLLM (#1611) · c7b03ad4

Hailey Schoelkopf authored Mar 20, 2024

* make vllm use prefix_token_id ; have prefix_token_id be optional method to define

* custom_prefix_token_id wasn't set if not passed

c7b03ad4