Commits · 3eaa493e4bcffcd63b3b38d513b5be4f25a320eb · gaoqiong / lm-evaluation-harness

11 Apr, 2021 2 commits
- Make acc_norm a separate metric · 3eaa493e
  Leo Gao authored Apr 10, 2021
  
  3eaa493e
- do per character loss aggregation for multiple choice tasks (similar to OAI's... · fae5fe66
  Ben Wang authored Apr 11, 2021
```
do per character loss aggregation for multiple choice tasks (similar to OAI's per token aggregation)
```
  fae5fe66
05 Apr, 2021 1 commit
- Implement partial caching · efbe6e7f
  Leo Gao authored Apr 04, 2021
```
Now, if a run gets interrupted halfway, you can easily resume
```
  efbe6e7f
27 Mar, 2021 3 commits
- Move fewshot stuff back into the right block · 0966e7b6
  Leo Gao authored Mar 26, 2021
  
  0966e7b6
- Pass random state around to fewshot_examples and fewshot_context · 3b2d5f6c
  Leo Gao authored Mar 26, 2021
  
  3b2d5f6c
- Fix fewshot for tasks without train docs · 0c402690
  Leo Gao authored Mar 26, 2021
  
  0c402690
25 Mar, 2021 1 commit
- Deterministic fewshot_experiments · 6a78fdaf
  Leo Gao authored Mar 25, 2021
  
  6a78fdaf
19 Feb, 2021 1 commit
- Add gpt2/3 tokenizer sanity check · 77b44470
  Leo Gao authored Feb 18, 2021
  
  77b44470
12 Feb, 2021 1 commit
- metrics file · 1fb90b91
  & authored Feb 12, 2021
  
  1fb90b91
11 Feb, 2021 3 commits
- Fixes to make greedy_until work · 432bd44c
  Leo Gao authored Feb 10, 2021
```
# Conflicts:
#	lm_eval/models/gpt2.py
#	lm_eval/tasks/squad.py
```
  432bd44c
- Fixes to make greedy_until work · 7b649ded
  Leo Gao authored Feb 10, 2021
  
  7b649ded
- Implement GPT2 greedy_until · e8f9dc71
  Leo Gao authored Feb 10, 2021
  
  e8f9dc71
08 Feb, 2021 2 commits
- LM: handle empty context · 359114fd
  Leo Gao authored Feb 07, 2021
  
  359114fd
- Fix caching · 77d4b087
  Leo Gao authored Feb 07, 2021
  
  77d4b087
05 Feb, 2021 4 commits
- Add test to make sure right space conventions are used · 049dfa34
  Leo Gao authored Feb 04, 2021
  
  049dfa34
- Add MultipleChoiceTest target · 2f5f42c6
  Leo Gao authored Feb 04, 2021
  
  2f5f42c6
- Add MultipleChoiceTask · 706cb53a
  Leo Gao authored Feb 04, 2021
  
  706cb53a
- Implement caching · d5cd9655
  Leo Gao authored Feb 04, 2021
  
  d5cd9655
04 Feb, 2021 2 commits

Leo Gao authored Feb 03, 2021

- Extract evaluator (still needs work to clean up)
- Add tests for evaluator
- Fix all the things that break on the new tests
- Misc cleanup

778e0f91

Fix lambada · b57d059a
Leo Gao authored Feb 03, 2021

b57d059a

03 Feb, 2021 2 commits
- Fix naming convention to avoid `pytest` name mangling invocation · 5cfb7308
  Jonathan Tow authored Feb 02, 2021
  
  5cfb7308
- Refactor `Dataset` naming and `HFTask` properties · a60ef6fa
  Jonathan Tow authored Feb 02, 2021
  
  a60ef6fa
30 Jan, 2021 1 commit
- Make *_docs not abstract · 1bf97c9e
  Leo Gao authored Jan 30, 2021
  
  1bf97c9e
29 Jan, 2021 1 commit
- Implement PiQA · 63854c10
  Leo Gao authored Jan 29, 2021
  
  63854c10
24 Jan, 2021 1 commit
- superglue ex wsc · 2e707b87
  Jason Phang authored Jan 23, 2021
  
  2e707b87
21 Jan, 2021 1 commit
- Adopt new framework for `glue` · 36467c0e
  Jonathan Tow authored Jan 21, 2021
  
  36467c0e
16 Jan, 2021 1 commit
- Refactor iteration, indentation fix · b19dff50
  thefazzer authored Jan 16, 2021
  
  b19dff50
12 Jan, 2021 1 commit
- Add MultiRC Implementation · a538a1ad
  thefazzer authored Jan 12, 2021
  
  a538a1ad
10 Jan, 2021 1 commit
- Added F1_score metric · 5165bd38
  thefazzer authored Jan 10, 2021
  
  5165bd38
09 Jan, 2021 2 commits
- Move higher_is_better and aggregation into their own functions · a18104a4
  Leo Gao authored Jan 08, 2021
  
  a18104a4
- Refactor and implement SAT evaluation · 0f9c1624
  Leo Gao authored Jan 08, 2021
  
  0f9c1624
05 Jan, 2021 1 commit
- Update interface · a9fe09e5
  Leo Gao authored Jan 05, 2021
  
  a9fe09e5
28 Dec, 2020 2 commits
- Update · e41a082c
  Leo Gao authored Dec 27, 2020
  
  e41a082c
- Update interfaces · 76e65788
  Leo Gao authored Dec 27, 2020
  
  76e65788
30 Nov, 2020 4 commits

Update docstring · 75db3899
Leo Gao authored Nov 30, 2020

75db3899
Remove num_tokens · e3031e84
Leo Gao authored Nov 30, 2020

e3031e84

Refactor to remove generate and fix some bad tokenization · 90e50b4c

Leo Gao authored Nov 30, 2020

In particular, the following assumptions are FALSE in general:
tokenize(context + continuation) = tokenize(context) + tokenize(continuation)
len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation))
tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context)

So we need to tip-toe around the problem by being careful with how we do it.

In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.

90e50b4c

Make fewshot_examples fast · 6de520af
Leo Gao authored Nov 30, 2020

6de520af

06 Oct, 2020 1 commit
- Add abstract constructor · 988a400f
  Leo Gao authored Oct 05, 2020
  
  988a400f
05 Oct, 2020 1 commit
- Don't make Dataset.download abstract · b0585de4
  Leo Gao authored Oct 05, 2020
```
It's ok if subclasses don't implement download, no-op default is ok
```
  b0585de4