- 28 Dec, 2020 2 commits
- 30 Nov, 2020 10 commits
-
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
In particular, the following assumptions are FALSE in general: tokenize(context + continuation) = tokenize(context) + tokenize(continuation) len(tokenize(context + continuation)) = len(tokenize(context)) + len(tokenize(continuation)) tokenize(context + continuation)[:len(tokenize(context))] = tokenize(context) So we need to tip-toe around the problem by being careful with how we do it. In particular, using Fast is not just for performance; while behavour of GPT2Tokenizer differs across Transformers 2 and 3, GPT2TokenizerFast doesn't.
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
Leo Gao authored
-
- 31 Oct, 2020 1 commit
-
-
Leo Gao authored
-
- 25 Oct, 2020 1 commit
-
-
Charles Foster authored
Add SAT analogies dataset. Manual download needed. Checksums currently not verified, but hash is included as a comment.
-
- 24 Oct, 2020 21 commits
-
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Charles Foster authored
-
Charles Foster authored
-
Charles Foster authored
-
Charles Foster authored
-
Charles Foster authored
-
Anish Thite authored
-
Jason Phang authored
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Anish Thite authored
-
Charles Foster authored
-
Anish Thite authored
-
- 23 Oct, 2020 1 commit
-
-
Charles Foster authored
Renamed WSC to make distinction between SuperGLUE Winograd Schemas (SGWinogradSchemaChallenge) and WSC273 (WinogradSchemaChallenge273) clearer. Also, added WSC273.
-
- 22 Oct, 2020 4 commits
-
-
Charles Foster authored
-
Charles Foster authored
-
Charles Foster authored
-
Charles Foster authored
-