Commits · a21df3552e2b1db6fd0e0ea6efd5d8205f89d1df · gaoqiong / lm-evaluation-harness

04 Feb, 2021 2 commits

Leo Gao authored Feb 03, 2021

- Extract evaluator (still needs work to clean up)
- Add tests for evaluator
- Fix all the things that break on the new tests
- Misc cleanup

778e0f91

Fix lambada · b57d059a
Leo Gao authored Feb 03, 2021

b57d059a

03 Feb, 2021 2 commits
- Fix naming convention to avoid `pytest` name mangling invocation · 5cfb7308
  Jonathan Tow authored Feb 02, 2021
  
  5cfb7308
- Refactor `Dataset` naming and `HFTask` properties · a60ef6fa
  Jonathan Tow authored Feb 02, 2021
  
  a60ef6fa
30 Jan, 2021 1 commit
- Make *_docs not abstract · 1bf97c9e
  Leo Gao authored Jan 30, 2021
  
  1bf97c9e
29 Jan, 2021 1 commit
- Implement PiQA · 63854c10
  Leo Gao authored Jan 29, 2021
  
  63854c10
24 Jan, 2021 1 commit
- superglue ex wsc · 2e707b87
  Jason Phang authored Jan 23, 2021
  
  2e707b87
21 Jan, 2021 1 commit
- Adopt new framework for `glue` · 36467c0e
  Jonathan Tow authored Jan 21, 2021
  
  36467c0e
16 Jan, 2021 1 commit
- Refactor iteration, indentation fix · b19dff50
  thefazzer authored Jan 16, 2021
  
  b19dff50
12 Jan, 2021 1 commit
- Add MultiRC Implementation · a538a1ad
  thefazzer authored Jan 12, 2021
  
  a538a1ad
10 Jan, 2021 1 commit
- Added F1_score metric · 5165bd38
  thefazzer authored Jan 10, 2021
  
  5165bd38
09 Jan, 2021 2 commits
- Move higher_is_better and aggregation into their own functions · a18104a4
  Leo Gao authored Jan 08, 2021
  
  a18104a4
- Refactor and implement SAT evaluation · 0f9c1624
  Leo Gao authored Jan 08, 2021
  
  0f9c1624
05 Jan, 2021 1 commit
- Update interface · a9fe09e5
  Leo Gao authored Jan 05, 2021
  
  a9fe09e5
28 Dec, 2020 2 commits
- Update · e41a082c
  Leo Gao authored Dec 27, 2020
  
  e41a082c
- Update interfaces · 76e65788
  Leo Gao authored Dec 27, 2020
  
  76e65788
30 Nov, 2020 4 commits

Update docstring · 75db3899
Leo Gao authored Nov 30, 2020

75db3899
Remove num_tokens · e3031e84
Leo Gao authored Nov 30, 2020

e3031e84

Refactor to remove generate and fix some bad tokenization · 90e50b4c

Leo Gao authored Nov 30, 2020

In particular, the following assumptions are FALSE in general:
tokenize(context + continuation) = tokenize(context) + tokenize(continuation)
len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation))
tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context)

So we need to tip-toe around the problem by being careful with how we do it.

In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.

90e50b4c

Make fewshot_examples fast · 6de520af
Leo Gao authored Nov 30, 2020

6de520af

06 Oct, 2020 1 commit
- Add abstract constructor · 988a400f
  Leo Gao authored Oct 05, 2020
  
  988a400f
05 Oct, 2020 2 commits
- Don't make Dataset.download abstract · b0585de4
  Leo Gao authored Oct 05, 2020
```
It's ok if subclasses don't implement download, no-op default is ok
```
  b0585de4
- add download method to base class · 4e2d1498
  sdtblck authored Oct 05, 2020
  
  4e2d1498
17 Sep, 2020 1 commit
- refactor to explicit registries · 89de8d7d
  Jason Phang authored Sep 17, 2020
  
  89de8d7d
14 Sep, 2020 1 commit
- SuperGLUE, and truncation · 8161c22e
  Jason Phang authored Sep 14, 2020
  
  8161c22e
07 Sep, 2020 9 commits
- seed · 9987203f
  Jason Phang authored Sep 07, 2020
  
  9987203f
- remove seeding · 32624a1d
  Jason Phang authored Sep 07, 2020
  
  32624a1d
- lib · f88bb827
  Jason Phang authored Sep 07, 2020
  
  f88bb827
- glue tasks · cf80f340
  Jason Phang authored Sep 07, 2020
  
  cf80f340
- checkin · 2d4b3a8c
  Jason Phang authored Sep 07, 2020
  
  2d4b3a8c
- gpt3 · 12e12bc0
  Jason Phang authored Sep 07, 2020
  
  12e12bc0
- Add extra options to evaluate · 31696910
  Leo Gao authored Sep 06, 2020
  
  31696910
- Change nll to loglikelihood · 75920267
  Leo Gao authored Sep 06, 2020
  
  75920267
- merge changes, added datasets to README.md and if train dataset or val dataset exists · bd348542
  Anish Thite authored Sep 06, 2020
  
  bd348542
06 Sep, 2020 1 commit
- Add evaluate method · 8a5a5f74
  Leo Gao authored Sep 06, 2020
  
  8a5a5f74
28 Aug, 2020 1 commit
- Initial Commit · 1c7fb748
  Leo Gao authored Aug 27, 2020
  
  1c7fb748