- 28 Dec, 2020 2 commits
- 27 Dec, 2020 2 commits
- 25 Dec, 2020 5 commits
-
-
Stella Biderman authored
Create CODEOWNERS
-
Stella Biderman authored
Tweak StoryCloze script to be agnostic to tokenization
-
uyhcire authored
-
Stella Biderman authored
Add naive eval script for StoryCloze
-
uyhcire authored
-
- 24 Dec, 2020 1 commit
-
-
uyhcire authored
-
- 23 Dec, 2020 2 commits
-
-
Leo Gao authored
-
Stella Biderman authored
-
- 01 Dec, 2020 1 commit
-
-
Stella Biderman authored
Refactor to remove generate and fix some bad tokenization.
-
- 30 Nov, 2020 12 commits
-
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
In particular, the following assumptions are FALSE in general: tokenize(context + continuation) = tokenize(context) + tokenize(continuation) len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation)) tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context) So we need to tip-toe around the problem by being careful with how we do it. In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
# Conflicts: # write_out.py
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
- 29 Nov, 2020 1 commit
-
-
Leo Gao authored
-
- 23 Nov, 2020 6 commits
-
-
Stella Biderman authored
Changing implementation of the —limit flag
-
Stella Biderman authored
-
Stella Biderman authored
-
Stella Biderman authored
-
Stella Biderman authored
This should allow the user to only do the first few eval tasks.
-
Leo Gao authored
-
- 31 Oct, 2020 1 commit
-
-
Leo Gao authored
-
- 26 Oct, 2020 1 commit
-
-
Stella Biderman authored
Add SAT Analogy dataset
-
- 25 Oct, 2020 2 commits
-
-
Charles Foster authored
Add SAT analogies dataset. Manual download needed. Checksums currently not verified, but hash is included as a comment.
-
Charles Foster authored
Sync up to EAI
-
- 24 Oct, 2020 4 commits
-
-
Stella Biderman authored
add rte text
-
Anish Thite authored
-
Anish Thite authored
-
Stella Biderman authored
Add Natural Questions
-