model card for German Sentence Embeddings V2 (#7952)

* model card German Sentence Embeddings V2 - for German RoBERTa for Sentence Embeddings V2 - marked old as outdated * small correction * small improvement in description * small spelling fix * spelling fix * add evaluation results * spearman explanation * add number of trials

model card for German Sentence Embeddings V2 (#7952)
* model card German Sentence Embeddings V2 - for German RoBERTa for Sentence Embeddings V2 - marked old as outdated * small correction * small improvement in description * small spelling fix * spelling fix * add evaluation results * spearman explanation * add number of trials
9865e1fe · Philip May · GitHub · d39da5a2 · 9865e1fe · 9865e1fe
Unverified Commit 9865e1fe authored Oct 23, 2020 by Philip May Committed by GitHub Oct 23, 2020
2 changed files
--- a/model_cards/T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb/README.md
+++ b/model_cards/T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb/README.md
@@ -4,44 +4,4 @@ license: mit
 ---

 # bert-german-dbmdz-uncased-sentence-stsb
-
-## How to use
-**The usage description above - provided by Hugging Face - is wrong! Please use this:**
-
-Install the `sentence-transformers` package. See here: <https://github.com/UKPLab/sentence-transformers>
-```python
-from sentence_transformers import models
-from sentence_transformers import SentenceTransformer
-
-# load BERT model from Hugging Face
-word_embedding_model = models.Transformer(
-    'T-Systems-onsite/bert-german-dbmdz-uncased-sentence-stsb')
-
-# Apply mean pooling to get one fixed sized sentence vector
-pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(),
-                               pooling_mode_mean_tokens=True,
-                               pooling_mode_cls_token=False,
-                               pooling_mode_max_tokens=False)
-
-# join BERT model and pooling to get the sentence transformer
-model = SentenceTransformer(modules=[word_embedding_model, pooling_model])
-```
-
-## Model description
-This is a German [sentence embedding](https://github.com/UKPLab/sentence-transformers) trained on the [German STSbenchmark Dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark). It was trained from [Philip May](https://eniak.de/) and open-sourced by [T-Systems-onsite](https://www.t-systems-onsite.de/).The base language model is the [dbmdz/bert-base-german-uncased](https://huggingface.co/dbmdz/bert-base-german-uncased) from [Bayerische Staatsbibliothek ](https://huggingface.co/dbmdz).
-
-## Intended uses
-> Sentence-BERT (SBERT) is a  modification  of  the  pretrained BERT network that use siamese and triplet network structures to derive semantically mean-ingful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.
-
-Source: [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
-
-## Training procedure
-We did an automatic hyperprameter optimization with [Optuna](https://github.com/optuna/optuna) and found the following hyperprameters:
- batch_size = 5
- num_epochs = 11
- lr = 2.637549780860126e-05
- eps = 5.0696075038683e-06
- weight_decay = 0.02817210102940054
- warmup_steps = 27.342745941760147 % of total steps
-
-The final model was trained on the combination of all three datasets: `sts_de_dev.csv`, `sts_de_test.csv` and `sts_de_train.csv`
+**This model is outdated! Please use this improved version: <https://huggingface.co/T-Systems-onsite/german-roberta-sentence-transformer-v2>**
--- a/model_cards/T-Systems-onsite/german-roberta-sentence-transformer-v2/README.md
+++ b/model_cards/T-Systems-onsite/german-roberta-sentence-transformer-v2/README.md
+---
+language: de
+license: mit
+---
+
+# German RoBERTa for Sentence Embeddings V2
+This model is intended to [compute sentence (text embeddings)](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html) for German text. These embeddings can then be compared with [cosine-similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to find sentences with a similar semantic meaning. For example this can be useful for [semantic textual similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html), [semantic search](https://www.sbert.net/docs/usage/semantic_search.html), or [paraphrase mining](https://www.sbert.net/docs/usage/paraphrase_mining.html). To do this you have to use the [Sentence Transformers Python framework](https://github.com/UKPLab/sentence-transformers).
+
+> Sentence-BERT (SBERT) is a  modification  of  the  pretrained BERT network that use siamese and triplet network structures to derive semantically mean-ingful sentence embeddings that can be compared using cosine-similarity. This reduces the effort for finding the most similar pair from 65hours with BERT / RoBERTa to about 5 seconds with SBERT, while maintaining the accuracy from BERT.
+
+Source: [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
+
+This model is fine-tuned from [Philip May](https://eniak.de/) and open-sourced by [T-Systems-onsite](https://www.t-systems-onsite.de/). Special thanks to [Nils Reimers](https://www.nils-reimers.de/) for your awesome open-source work, the Sentence Transformers, the models and all your help on GitHub.
+
+## How to use
+**The usage description above - provided by Hugging Face - is wrong for sentence embeddings! Please use this:**
+
+To use this model install the `sentence-transformers` package (see here: <https://github.com/UKPLab/sentence-transformers>).
+
+```python
+from sentence_transformers import SentenceTransformer
+model = SentenceTransformer('T-Systems-onsite/german-roberta-sentence-transformer-v2')
+```
+
+For details of usage and examples see here:
+- [Computing Sentence Embeddings](https://www.sbert.net/docs/usage/computing_sentence_embeddings.html)
+- [Semantic Textual Similarity](https://www.sbert.net/docs/usage/semantic_textual_similarity.html)
+- [Paraphrase Mining](https://www.sbert.net/docs/usage/paraphrase_mining.html)
+- [Semantic Search](https://www.sbert.net/docs/usage/semantic_search.html)
+- [Cross-Encoders](https://www.sbert.net/docs/usage/cross-encoder.html)
+- [Examples on GitHub](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications)
+
+## Training
+The base model is [xlm-roberta-base](https://huggingface.co/xlm-roberta-base). This model has been further trained by [Nils Reimers](https://www.nils-reimers.de/) on a large scale paraphrase dataset for 50+ languages. [Nils Reimers](https://www.nils-reimers.de/) about this [on GitHub](https://github.com/UKPLab/sentence-transformers/issues/509#issuecomment-712243280):
+
+>A paper is upcoming for the paraphrase models.
+>
+>These models were trained on various datasets with Millions of examples for paraphrases, mainly derived from Wikipedia edit logs, paraphrases mined from Wikipedia and SimpleWiki, paraphrases from news reports, AllNLI-entailment pairs with in-batch-negative loss etc.
+>
+>In internal tests, they perform much better than the NLI+STSb models as they have see more and broader type of training data. NLI+STSb has the issue that they are rather narrow in their domain and do not contain any domain specific words / sentences (like from chemistry, computer science, math etc.). The paraphrase models has seen plenty of sentences from various domains.
+>
+>More details with the setup, all the datasets, and a wider evaluation will follow soon.
+
+The resulting model called `xlm-r-distilroberta-base-paraphrase-v1` has been released here: <https://github.com/UKPLab/sentence-transformers/releases/tag/v0.3.8>
+
+Building on this cross language model we fine-tuned it for German language on the deepl.com dataset of our [German STSbenchmark dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark).
+
+We did an automatic hyperprameter search for 102 trials with [Optuna](https://github.com/optuna/optuna). Using crossvalidation on the deepl.com test and dev dataset we found the following best hyperprameters:
+- batch_size = 15
+- num_epochs = 4
+- lr = 2.2995320905210864e-05
+- eps = 1.8979875906303792e-06
+- weight_decay = 0.003314045812507563
+- warmup_steps_proportion = 0.46141685205829014
+
+The final model was trained with these hyperparameters on the combination of `sts_de_train.csv` and `sts_de_dev.csv`. The `sts_de_test.csv` was left for testing. The AWS dataset has not been used.
+
+# Evaluation
+The evaluation has been done on the test set of our [German STSbenchmark dataset](https://github.com/t-systems-on-site-services-gmbh/german-STSbenchmark). The code is available on [Colab](https://colab.research.google.com/drive/1aCWOqDQx953kEnQ5k4Qn7uiixokocOHv?usp=sharing). As the metric for evaluation we use the Spearman’s rank correlation between the  cosine-similarity of the sentence embeddings and STSbenchmark labels.
+
+| Model Name                           | Spearman rank correlation         |
+|--------------------------------------|-----------------------------------|
+| xlm-r-distilroberta-base-paraphrase-v1                      | 0.8079     |
+| xlm-r-100langs-bert-base-nli-stsb-mean-tokens               | 0.8194     |
+| xlm-r-bert-base-nli-stsb-mean-tokens                        | 0.8194     |
+| **T-Systems-onsite/german-roberta-sentence-transformer-v2** | **0.8529** |