Commits · 041ea8a7a479ebcc6fec3bccf89af854fb4e8cd1 · gaoqiong / lm-evaluation-harness

05 Jan, 2021 7 commits
- Merge branch 'bmk_refactor' of github.com:EleutherAI/lm_evaluation_harness into bmk_refactor · 041ea8a7
  Leo Gao authored Jan 05, 2021
  
  041ea8a7
- Merge remote-tracking branch 'origin/master' into bmk_refactor · 18281758
  Leo Gao authored Jan 05, 2021
```
# Conflicts:
#	batch_eval/main.py
```
  18281758
- Update README.md · 75bea6d5
  Stella Biderman authored Jan 05, 2021
  
  75bea6d5
- Update README.md · 611b1d55
  Stella Biderman authored Jan 05, 2021
  
  611b1d55
- Add reminder to rewrite · 08dc67ea
  Leo Gao authored Jan 05, 2021
  
  08dc67ea
- Update interface · a9fe09e5
  Leo Gao authored Jan 05, 2021
  
  a9fe09e5
- Remove batch_eval · ea7c5a2c
  Leo Gao authored Jan 05, 2021
  
  ea7c5a2c
03 Jan, 2021 1 commit
- Fix memory problem · f298ca76
  Leo Gao authored Jan 03, 2021
  
  f298ca76
02 Jan, 2021 4 commits
- Remove print statement · 984f617b
  Leo Gao authored Jan 02, 2021
  
  984f617b
- Fix up bugs · 04e72904
  Leo Gao authored Jan 02, 2021
  
  04e72904
- Fix typo · fb176b57
  Leo Gao authored Jan 02, 2021
  
  fb176b57
- Finish plumbing · 011a0b0c
  Leo Gao authored Jan 02, 2021
  
  011a0b0c
30 Dec, 2020 1 commit
- Merge pull request #71 from EleutherAI/uyhcire-normalize-loglikelihoods · 2e1b05d2
  Stella Biderman authored Dec 29, 2020
```
Fix eval script to normalize loglikelihoods
```
  2e1b05d2
29 Dec, 2020 1 commit
- Merge pull request #70 from EleutherAI/uyhcire-batching-without-asyncio · 870a247a
  Stella Biderman authored Dec 28, 2020
```
Batch model inputs to speed things up
```
  870a247a
28 Dec, 2020 2 commits
- Update · e41a082c
  Leo Gao authored Dec 27, 2020
  
  e41a082c
- Update interfaces · 76e65788
  Leo Gao authored Dec 27, 2020
  
  76e65788
27 Dec, 2020 4 commits
- Fix eval script to normalize loglikelihoods · 8315dce7
  uyhcire authored Dec 27, 2020
  
  8315dce7
- Batch model inputs to speed things up · 599045ba
  uyhcire authored Dec 25, 2020
  
  599045ba
- Add example · 9edbc7c0
  Leo Gao authored Dec 26, 2020
  
  9edbc7c0
- Merge branch 'master' of github.com:EleutherAI/lm_evaluation_harness · 1e6c2885
  Leo Gao authored Dec 26, 2020
  
  1e6c2885
25 Dec, 2020 5 commits
- Merge pull request #65 from EleutherAI/StellaAthena-patch-1 · 622f17ce
  Stella Biderman authored Dec 24, 2020
```
Create CODEOWNERS
```
  622f17ce
- Merge pull request #67 from EleutherAI/uyhcire-new-harness-2 · 9d975e23
  Stella Biderman authored Dec 24, 2020
```
Tweak StoryCloze script to be agnostic to tokenization
```
  9d975e23
- Tweak StoryCloze script to be agnostic to tokenization · 6b453bfd
  uyhcire authored Dec 25, 2020
  
  6b453bfd
- Merge pull request #66 from EleutherAI/uychire-new-harness-1 · e4af3e77
  Stella Biderman authored Dec 24, 2020
```
Add naive eval script for StoryCloze
```
  e4af3e77
- First pass at StoryCloze evaluation script · 4dbde45a
  uyhcire authored Dec 24, 2020
  
  4dbde45a
24 Dec, 2020 1 commit
- Basic setup · 90e02bca
  uyhcire authored Dec 24, 2020
  
  90e02bca
23 Dec, 2020 2 commits
- Merge branch 'master' of github.com:EleutherAI/lm_evaluation_harness · 12c2ee1e
  Leo Gao authored Dec 23, 2020
  
  12c2ee1e
- Create CODEOWNERS · 87b20e1d
  Stella Biderman authored Dec 23, 2020
  
  87b20e1d
01 Dec, 2020 1 commit
- Merge pull request #63 from EleutherAI/refactor_tokenization · 61ff104e
  Stella Biderman authored Dec 01, 2020
```
Refactor to remove generate and fix some bad tokenization.
```
  61ff104e
30 Nov, 2020 11 commits
- Update docstring · 75db3899
  Leo Gao authored Nov 30, 2020
  
  75db3899
- Remove num_tokens · e3031e84
  Leo Gao authored Nov 30, 2020
  
  e3031e84
- Add missing import · cf69ba9c
  Leo Gao authored Nov 30, 2020
  
  cf69ba9c
- Refactor to remove generate and fix some bad tokenization · 90e50b4c
  Leo Gao authored Nov 30, 2020
```
In particular, the following assumptions are FALSE in general:
tokenize(context + continuation) = tokenize(context) + tokenize(continuation)
len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation))
tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context)

So we need to tip-toe around the problem by being careful with how we do it.

In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.
```
  90e50b4c
- Make fewshot_examples fast · 6de520af
  Leo Gao authored Nov 30, 2020
  
  6de520af
- Undo MNLI changes · ff3adfe2
  Leo Gao authored Nov 29, 2020
  
  ff3adfe2
- Merge branch 'master' of github.com:EleutherAI/lm_evaluation_harness · 70e30a52
  Leo Gao authored Nov 29, 2020
```
# Conflicts:
#	write_out.py
```
  70e30a52
- Fix MNLIMismatched · 26d59f34
  Leo Gao authored Nov 29, 2020
  
  26d59f34
- Allow specifying sets for write_out · 49cc6f5d
  Leo Gao authored Nov 29, 2020
  
  49cc6f5d
- Allow specifying sets for write_out · d4ae0c00
  Leo Gao authored Nov 29, 2020
  
  d4ae0c00
- Fix MNLI train set · 1c9432de
  Leo Gao authored Nov 29, 2020
  
  1c9432de