Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
chenpangpang
transformers
Commits
e279a312
Unverified
Commit
e279a312
authored
Mar 24, 2020
by
Mohamed El-Geish
Committed by
GitHub
Mar 24, 2020
Browse files
Model cards for CS224n SQuAD2.0 models (#3406)
* Model cards for CS224n SQuAD2.0 models * consistent spacing
parent
7372e62b
Changes
5
Hide whitespace changes
Inline
Side-by-side
Showing
5 changed files
with
370 additions
and
0 deletions
+370
-0
model_cards/elgeish/cs224n-squad2.0-albert-base-v2/README.md
model_cards/elgeish/cs224n-squad2.0-albert-base-v2/README.md
+74
-0
model_cards/elgeish/cs224n-squad2.0-albert-large-v2/README.md
...l_cards/elgeish/cs224n-squad2.0-albert-large-v2/README.md
+74
-0
model_cards/elgeish/cs224n-squad2.0-albert-xxlarge-v1/README.md
...cards/elgeish/cs224n-squad2.0-albert-xxlarge-v1/README.md
+74
-0
model_cards/elgeish/cs224n-squad2.0-distilbert-base-uncased/README.md
...elgeish/cs224n-squad2.0-distilbert-base-uncased/README.md
+74
-0
model_cards/elgeish/cs224n-squad2.0-roberta-base/README.md
model_cards/elgeish/cs224n-squad2.0-roberta-base/README.md
+74
-0
No files found.
model_cards/elgeish/cs224n-squad2.0-albert-base-v2/README.md
0 → 100644
View file @
e279a312
## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising
baselines to beat for the
[
Default Final Project
](
http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf
)
.
The training set used to fine-tune this model is the same as
the
[
official one
](
https://rajpurkar.github.io/SQuAD-explorer/
)
; however,
evaluation and model selection were performed using roughly half of the official
dev set, 6078 examples, picked at random. The data files can be found at
<https://github.com/elgeish/squad/tree/master/data>
— this is the Winter 2020
version. Given that the official SQuAD2.0 dev set contains the project's test
set, students must make sure not to use the official SQuAD2.0 dev set in any way
— including the use of models fine-tuned on the official SQuAD2.0, since they
used the official SQuAD2.0 dev set for model selection.
## Results
```
json
{
"exact"
:
78.94044093451794
,
"f1"
:
81.7724930324639
,
"total"
:
6078
,
"HasAns_exact"
:
76.28865979381443
,
"HasAns_f1"
:
82.20385314478195
,
"HasAns_total"
:
2910
,
"NoAns_exact"
:
81.37626262626263
,
"NoAns_f1"
:
81.37626262626263
,
"NoAns_total"
:
3168
,
"best_exact"
:
78.95689371503784
,
"best_exact_thresh"
:
0.0
,
"best_f1"
:
81.78894581298378
,
"best_f1_thresh"
:
0.0
}
```
## Notable Arguments
```
json
{
"do_lower_case"
:
true
,
"doc_stride"
:
128
,
"fp16"
:
false
,
"fp16_opt_level"
:
"O1"
,
"gradient_accumulation_steps"
:
24
,
"learning_rate"
:
3e-05
,
"max_answer_length"
:
30
,
"max_grad_norm"
:
1
,
"max_query_length"
:
64
,
"max_seq_length"
:
384
,
"model_name_or_path"
:
"albert-base-v2"
,
"model_type"
:
"albert"
,
"num_train_epochs"
:
3
,
"per_gpu_train_batch_size"
:
8
,
"save_steps"
:
5000
,
"seed"
:
42
,
"train_batch_size"
:
8
,
"version_2_with_negative"
:
true
,
"warmup_steps"
:
0
,
"weight_decay"
:
0
}
```
## Environment Setup
```
json
{
"transformers"
:
"2.5.1"
,
"pytorch"
:
"1.4.0=py3.6_cuda10.1.243_cudnn7.6.3_0"
,
"python"
:
"3.6.5=hc3d631a_2"
,
"os"
:
"Linux 4.15.0-1060-aws #62-Ubuntu SMP Tue Feb 11 21:23:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
,
"gpu"
:
"Tesla V100-SXM2-16GB"
}
```
## Related Models
*
[
elgeish/cs224n-squad2.0-albert-large-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-large-v2
)
*
[
elgeish/cs224n-squad2.0-albert-xxlarge-v1
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-xxlarge-v1
)
*
[
elgeish/cs224n-squad2.0-distilbert-base-uncased
](
https://huggingface.co/elgeish/cs224n-squad2.0-distilbert-base-uncased
)
*
[
elgeish/cs224n-squad2.0-roberta-base
](
https://huggingface.co/elgeish/cs224n-squad2.0-roberta-base
)
model_cards/elgeish/cs224n-squad2.0-albert-large-v2/README.md
0 → 100644
View file @
e279a312
## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising
baselines to beat for the
[
Default Final Project
](
http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf
)
.
The training set used to fine-tune this model is the same as
the
[
official one
](
https://rajpurkar.github.io/SQuAD-explorer/
)
; however,
evaluation and model selection were performed using roughly half of the official
dev set, 6078 examples, picked at random. The data files can be found at
<https://github.com/elgeish/squad/tree/master/data>
— this is the Winter 2020
version. Given that the official SQuAD2.0 dev set contains the project's test
set, students must make sure not to use the official SQuAD2.0 dev set in any way
— including the use of models fine-tuned on the official SQuAD2.0, since they
used the official SQuAD2.0 dev set for model selection.
## Results
```
json
{
"exact"
:
79.2694965449161
,
"f1"
:
82.50844352970152
,
"total"
:
6078
,
"HasAns_exact"
:
74.87972508591065
,
"HasAns_f1"
:
81.64478342732858
,
"HasAns_total"
:
2910
,
"NoAns_exact"
:
83.30176767676768
,
"NoAns_f1"
:
83.30176767676768
,
"NoAns_total"
:
3168
,
"best_exact"
:
79.2694965449161
,
"best_exact_thresh"
:
0.0
,
"best_f1"
:
82.50844352970155
,
"best_f1_thresh"
:
0.0
}
```
## Notable Arguments
```
json
{
"do_lower_case"
:
true
,
"doc_stride"
:
128
,
"fp16"
:
false
,
"fp16_opt_level"
:
"O1"
,
"gradient_accumulation_steps"
:
1
,
"learning_rate"
:
3e-05
,
"max_answer_length"
:
30
,
"max_grad_norm"
:
1
,
"max_query_length"
:
64
,
"max_seq_length"
:
384
,
"model_name_or_path"
:
"albert-large-v2"
,
"model_type"
:
"albert"
,
"num_train_epochs"
:
5
,
"per_gpu_train_batch_size"
:
8
,
"save_steps"
:
5000
,
"seed"
:
42
,
"train_batch_size"
:
8
,
"version_2_with_negative"
:
true
,
"warmup_steps"
:
0
,
"weight_decay"
:
0
}
```
## Environment Setup
```
json
{
"transformers"
:
"2.5.1"
,
"pytorch"
:
"1.4.0=py3.6_cuda10.1.243_cudnn7.6.3_0"
,
"python"
:
"3.6.5=hc3d631a_2"
,
"os"
:
"Linux 4.15.0-1060-aws #62-Ubuntu SMP Tue Feb 11 21:23:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
,
"gpu"
:
"Tesla V100-SXM2-16GB"
}
```
## Related Models
*
[
elgeish/cs224n-squad2.0-albert-base-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-base-v2
)
*
[
elgeish/cs224n-squad2.0-albert-xxlarge-v1
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-xxlarge-v1
)
*
[
elgeish/cs224n-squad2.0-distilbert-base-uncased
](
https://huggingface.co/elgeish/cs224n-squad2.0-distilbert-base-uncased
)
*
[
elgeish/cs224n-squad2.0-roberta-base
](
https://huggingface.co/elgeish/cs224n-squad2.0-roberta-base
)
model_cards/elgeish/cs224n-squad2.0-albert-xxlarge-v1/README.md
0 → 100644
View file @
e279a312
## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising
baselines to beat for the
[
Default Final Project
](
http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf
)
.
The training set used to fine-tune this model is the same as
the
[
official one
](
https://rajpurkar.github.io/SQuAD-explorer/
)
; however,
evaluation and model selection were performed using roughly half of the official
dev set, 6078 examples, picked at random. The data files can be found at
<https://github.com/elgeish/squad/tree/master/data>
— this is the Winter 2020
version. Given that the official SQuAD2.0 dev set contains the project's test
set, students must make sure not to use the official SQuAD2.0 dev set in any way
— including the use of models fine-tuned on the official SQuAD2.0, since they
used the official SQuAD2.0 dev set for model selection.
## Results
```
json
{
"exact"
:
85.93287265547877
,
"f1"
:
88.91258331187983
,
"total"
:
6078
,
"HasAns_exact"
:
84.36426116838489
,
"HasAns_f1"
:
90.58786301361013
,
"HasAns_total"
:
2910
,
"NoAns_exact"
:
87.37373737373737
,
"NoAns_f1"
:
87.37373737373737
,
"NoAns_total"
:
3168
,
"best_exact"
:
85.93287265547877
,
"best_exact_thresh"
:
0.0
,
"best_f1"
:
88.91258331187993
,
"best_f1_thresh"
:
0.0
}
```
## Notable Arguments
```
json
{
"do_lower_case"
:
true
,
"doc_stride"
:
128
,
"fp16"
:
false
,
"fp16_opt_level"
:
"O1"
,
"gradient_accumulation_steps"
:
24
,
"learning_rate"
:
3e-05
,
"max_answer_length"
:
30
,
"max_grad_norm"
:
1
,
"max_query_length"
:
64
,
"max_seq_length"
:
512
,
"model_name_or_path"
:
"albert-xxlarge-v1"
,
"model_type"
:
"albert"
,
"num_train_epochs"
:
4
,
"per_gpu_train_batch_size"
:
1
,
"save_steps"
:
1000
,
"seed"
:
42
,
"train_batch_size"
:
1
,
"version_2_with_negative"
:
true
,
"warmup_steps"
:
814
,
"weight_decay"
:
0
}
```
## Environment Setup
```
json
{
"transformers"
:
"2.5.1"
,
"pytorch"
:
"1.4.0=py3.6_cuda10.1.243_cudnn7.6.3_0"
,
"python"
:
"3.6.5=hc3d631a_2"
,
"os"
:
"Linux 4.15.0-1060-aws #62-Ubuntu SMP Tue Feb 11 21:23:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
,
"gpu"
:
"Tesla V100-SXM2-16GB"
}
```
## Related Models
*
[
elgeish/cs224n-squad2.0-albert-base-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-base-v2
)
*
[
elgeish/cs224n-squad2.0-albert-large-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-large-v2
)
*
[
elgeish/cs224n-squad2.0-distilbert-base-uncased
](
https://huggingface.co/elgeish/cs224n-squad2.0-distilbert-base-uncased
)
*
[
elgeish/cs224n-squad2.0-roberta-base
](
https://huggingface.co/elgeish/cs224n-squad2.0-roberta-base
)
model_cards/elgeish/cs224n-squad2.0-distilbert-base-uncased/README.md
0 → 100644
View file @
e279a312
## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising
baselines to beat for the
[
Default Final Project
](
http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf
)
.
The training set used to fine-tune this model is the same as
the
[
official one
](
https://rajpurkar.github.io/SQuAD-explorer/
)
; however,
evaluation and model selection were performed using roughly half of the official
dev set, 6078 examples, picked at random. The data files can be found at
<https://github.com/elgeish/squad/tree/master/data>
— this is the Winter 2020
version. Given that the official SQuAD2.0 dev set contains the project's test
set, students must make sure not to use the official SQuAD2.0 dev set in any way
— including the use of models fine-tuned on the official SQuAD2.0, since they
used the official SQuAD2.0 dev set for model selection.
## Results
```
json
{
"exact"
:
65.16946363935504
,
"f1"
:
67.87348075352251
,
"total"
:
6078
,
"HasAns_exact"
:
69.51890034364261
,
"HasAns_f1"
:
75.16667217179045
,
"HasAns_total"
:
2910
,
"NoAns_exact"
:
61.17424242424242
,
"NoAns_f1"
:
61.17424242424242
,
"NoAns_total"
:
3168
,
"best_exact"
:
65.16946363935504
,
"best_exact_thresh"
:
0.0
,
"best_f1"
:
67.87348075352243
,
"best_f1_thresh"
:
0.0
}
```
## Notable Arguments
```
json
{
"do_lower_case"
:
true
,
"doc_stride"
:
128
,
"fp16"
:
false
,
"fp16_opt_level"
:
"O1"
,
"gradient_accumulation_steps"
:
24
,
"learning_rate"
:
3e-05
,
"max_answer_length"
:
30
,
"max_grad_norm"
:
1
,
"max_query_length"
:
64
,
"max_seq_length"
:
384
,
"model_name_or_path"
:
"distilbert-base-uncased-distilled-squad"
,
"model_type"
:
"distilbert"
,
"num_train_epochs"
:
4
,
"per_gpu_train_batch_size"
:
32
,
"save_steps"
:
5000
,
"seed"
:
42
,
"train_batch_size"
:
32
,
"version_2_with_negative"
:
true
,
"warmup_steps"
:
0
,
"weight_decay"
:
0
}
```
## Environment Setup
```
json
{
"transformers"
:
"2.5.1"
,
"pytorch"
:
"1.4.0=py3.6_cuda10.1.243_cudnn7.6.3_0"
,
"python"
:
"3.6.5=hc3d631a_2"
,
"os"
:
"Linux 4.15.0-1060-aws #62-Ubuntu SMP Tue Feb 11 21:23:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
,
"gpu"
:
"Tesla V100-SXM2-16GB"
}
```
## Related Models
*
[
elgeish/cs224n-squad2.0-albert-base-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-base-v2
)
*
[
elgeish/cs224n-squad2.0-albert-large-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-large-v2
)
*
[
elgeish/cs224n-squad2.0-albert-xxlarge-v1
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-xxlarge-v1
)
*
[
elgeish/cs224n-squad2.0-roberta-base
](
https://huggingface.co/elgeish/cs224n-squad2.0-roberta-base
)
model_cards/elgeish/cs224n-squad2.0-roberta-base/README.md
0 → 100644
View file @
e279a312
## CS224n SQuAD2.0 Project Dataset
The goal of this model is to save CS224n students GPU time when establising
baselines to beat for the
[
Default Final Project
](
http://web.stanford.edu/class/cs224n/project/default-final-project-handout.pdf
)
.
The training set used to fine-tune this model is the same as
the
[
official one
](
https://rajpurkar.github.io/SQuAD-explorer/
)
; however,
evaluation and model selection were performed using roughly half of the official
dev set, 6078 examples, picked at random. The data files can be found at
<https://github.com/elgeish/squad/tree/master/data>
— this is the Winter 2020
version. Given that the official SQuAD2.0 dev set contains the project's test
set, students must make sure not to use the official SQuAD2.0 dev set in any way
— including the use of models fine-tuned on the official SQuAD2.0, since they
used the official SQuAD2.0 dev set for model selection.
## Results
```
json
{
"exact"
:
75.32082922013821
,
"f1"
:
78.66699523704254
,
"total"
:
6078
,
"HasAns_exact"
:
74.84536082474227
,
"HasAns_f1"
:
81.83436324767868
,
"HasAns_total"
:
2910
,
"NoAns_exact"
:
75.75757575757575
,
"NoAns_f1"
:
75.75757575757575
,
"NoAns_total"
:
3168
,
"best_exact"
:
75.32082922013821
,
"best_exact_thresh"
:
0.0
,
"best_f1"
:
78.66699523704266
,
"best_f1_thresh"
:
0.0
}
```
## Notable Arguments
```
json
{
"do_lower_case"
:
true
,
"doc_stride"
:
128
,
"fp16"
:
false
,
"fp16_opt_level"
:
"O1"
,
"gradient_accumulation_steps"
:
24
,
"learning_rate"
:
3e-05
,
"max_answer_length"
:
30
,
"max_grad_norm"
:
1
,
"max_query_length"
:
64
,
"max_seq_length"
:
384
,
"model_name_or_path"
:
"roberta-base"
,
"model_type"
:
"roberta"
,
"num_train_epochs"
:
4
,
"per_gpu_train_batch_size"
:
16
,
"save_steps"
:
5000
,
"seed"
:
42
,
"train_batch_size"
:
16
,
"version_2_with_negative"
:
true
,
"warmup_steps"
:
0
,
"weight_decay"
:
0
}
```
## Environment Setup
```
json
{
"transformers"
:
"2.5.1"
,
"pytorch"
:
"1.4.0=py3.6_cuda10.1.243_cudnn7.6.3_0"
,
"python"
:
"3.6.5=hc3d631a_2"
,
"os"
:
"Linux 4.15.0-1060-aws #62-Ubuntu SMP Tue Feb 11 21:23:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux"
,
"gpu"
:
"Tesla V100-SXM2-16GB"
}
```
## Related Models
*
[
elgeish/cs224n-squad2.0-albert-base-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-base-v2
)
*
[
elgeish/cs224n-squad2.0-albert-large-v2
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-large-v2
)
*
[
elgeish/cs224n-squad2.0-albert-xxlarge-v1
](
https://huggingface.co/elgeish/cs224n-squad2.0-albert-xxlarge-v1
)
*
[
elgeish/cs224n-squad2.0-distilbert-base-uncased
](
https://huggingface.co/elgeish/cs224n-squad2.0-distilbert-base-uncased
)
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment