Added scibert-nli model card

28424906 · Gabriele Sarti · Julien Chaumond · 18eec3a9 · 28424906
Commit 28424906 authored Mar 22, 2020 by Gabriele Sarti Committed by Julien Chaumond Mar 23, 2020
Hide whitespace changes
Inline Side-by-side

Showing with 32 additions and 0 deletions

model_cards/gsarti/scibert-nli/README.md model_cards/gsarti/scibert-nli/README.md +32 -0

No files found.
--- a/model_cards/gsarti/scibert-nli/README.md
+++ b/model_cards/gsarti/scibert-nli/README.md
+# SciBERT-NLI
+This is the model [SciBERT](https://github.com/allenai/scibert) [1] fine-tuned on the [SNLI](https://nlp.stanford.edu/projects/snli/) and the [MultiNLI](https://www.nyu.edu/projects/bowman/multinli/) datasets using the [`sentence-transformers` library](https://github.com/UKPLab/sentence-transformers/) to produce universal sentence embeddings [2].
+The model uses the original `scivocab` wordpiece vocabulary and was trained using the **average pooling strategy** and a **softmax loss**. 
+**Base model**: `allenai/scibert-scivocab-cased` from HuggingFace AutoModel
+**Parameters**:
+| Parameter      | Value |
+|----------------|-------|
+| Batch size     | 64    |
+| Training steps | 20000 |
+| Warmup steps   | 1450  |
+**Performances**: The performance was evaluated on the test portion of the [STS dataset](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) using Spearman rank correlation and compared to the performances of a general BERT base model obtained with the same procedure to verify their similarity.
+| Model                       | Score       |
+|-----------------------------|-------------|
+| `scibert-nli` (ours)        | 74.50       |
+| `bert-base-nli-mean-tokens` | 77.12       |
+An example usage for similarity-based scientific paper retrieval is provided in the [Covid Papers Browser](https://github.com/gsarti/covid-papers-browser) repository.
+**References:**
+[1] I. Beltagy et al, [SciBERT: A Pretrained Language Model for Scientific Text](https://www.aclweb.org/anthology/D19-1371/)
+[2] A. Conneau et al., [Supervised Learning of Universal Sentence Representations from Natural Language Inference Data](https://www.aclweb.org/anthology/D17-1070/)