- 05 Jan, 2021 7 commits
-
-
Leo Gao authored
-
Leo Gao authored
# Conflicts: # batch_eval/main.py
-
Stella Biderman authored
-
Stella Biderman authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
- 03 Jan, 2021 1 commit
-
-
Leo Gao authored
-
- 02 Jan, 2021 4 commits
- 30 Dec, 2020 1 commit
-
-
Stella Biderman authored
Fix eval script to normalize loglikelihoods
-
- 29 Dec, 2020 1 commit
-
-
Stella Biderman authored
Batch model inputs to speed things up
-
- 28 Dec, 2020 2 commits
- 27 Dec, 2020 4 commits
- 25 Dec, 2020 5 commits
-
-
Stella Biderman authored
Create CODEOWNERS
-
Stella Biderman authored
Tweak StoryCloze script to be agnostic to tokenization
-
uyhcire authored
-
Stella Biderman authored
Add naive eval script for StoryCloze
-
uyhcire authored
-
- 24 Dec, 2020 1 commit
-
-
uyhcire authored
-
- 23 Dec, 2020 2 commits
-
-
Leo Gao authored
-
Stella Biderman authored
-
- 01 Dec, 2020 1 commit
-
-
Stella Biderman authored
Refactor to remove generate and fix some bad tokenization.
-
- 30 Nov, 2020 11 commits
-
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
In particular, the following assumptions are FALSE in general: tokenize(context + continuation) = tokenize(context) + tokenize(continuation) len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation)) tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context) So we need to tip-toe around the problem by being careful with how we do it. In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
# Conflicts: # write_out.py
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-