Update version according to github

b0f4f53a · Rayyyyy · 392df446 · b0f4f53a · b0f4f53a · b0f4f53a
Commit b0f4f53a authored May 29, 2024 by Rayyyyy
20 changed files
--- a/docs/package_reference/cross_encoder/index.rst
+++ b/docs/package_reference/cross_encoder/index.rst
+
+Cross Encoder
+=============
+
+.. toctree::
+
+   cross_encoder
+   evaluation
\ No newline at end of file
--- a/docs/package_reference/sentence_transformer/SentenceTransformer.md
+++ b/docs/package_reference/sentence_transformer/SentenceTransformer.md
+# SentenceTransformer
+
+## SentenceTransformer
+```eval_rst
+.. autoclass:: sentence_transformers.SentenceTransformer
+   :members:
+   :inherited-members: fit, old_fit
+   :exclude-members: save_to_hub, add_module, append, apply, buffers, children, extra_repr, forward, get_buffer, get_extra_state, get_parameter, get_submodule, ipu, load_state_dict, modules, named_buffers, named_children, named_modules, named_parameters, parameters, register_backward_hook, register_buffer, register_forward_hook, register_forward_pre_hook, register_full_backward_hook, register_full_backward_pre_hook, register_load_state_dict_post_hook, register_module, register_parameter, register_state_dict_pre_hook, requires_grad_, set_extra_state, share_memory, state_dict, to_empty, type, xpu, zero_grad
+```
+
+## SentenceTransformerModelCardData
+```eval_rst
+.. autoclass:: sentence_transformers.model_card.SentenceTransformerModelCardData
+```
+
+## SimilarityFunction
+```eval_rst
+.. autoclass:: sentence_transformers.SimilarityFunction
+   :members:
+```
\ No newline at end of file
--- a/docs/package_reference/sentence_transformer/datasets.md
+++ b/docs/package_reference/sentence_transformer/datasets.md
+# Datasets
+`sentence_transformers.datasets` contains classes to organize your training input examples.
+
+
+
+## ParallelSentencesDataset
+`ParallelSentencesDataset` is used for multilingual training. For details, see [multilingual training](../../examples/training/multilingual/README.md).
+```eval_rst
+.. autoclass:: sentence_transformers.datasets.ParallelSentencesDataset
+```
+
+
+## SentenceLabelDataset
+`SentenceLabelDataset` can be used if you have labeled sentences and want to train with triplet loss.
+```eval_rst
+.. autoclass:: sentence_transformers.datasets.SentenceLabelDataset
+```
+
+## DenoisingAutoEncoderDataset
+`DenoisingAutoEncoderDataset` is used for unsupervised training with the TSDAE method.
+```eval_rst
+.. autoclass:: sentence_transformers.datasets.DenoisingAutoEncoderDataset
+```
+
+## NoDuplicatesDataLoader
+`NoDuplicatesDataLoader`can be used together with MultipleNegativeRankingLoss to ensure that no duplicates are within the same batch.
+```eval_rst
+.. autoclass:: sentence_transformers.datasets.NoDuplicatesDataLoader
+```
+
+
--- a/docs/package_reference/sentence_transformer/evaluation.md
+++ b/docs/package_reference/sentence_transformer/evaluation.md
+# Evaluation
+`sentence_transformers.evaluation` defines different classes, that can be used to evaluate the model during training.
+
+## BinaryClassificationEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.BinaryClassificationEvaluator
+```
+
+## EmbeddingSimilarityEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.EmbeddingSimilarityEvaluator
+```
+
+## InformationRetrievalEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.InformationRetrievalEvaluator
+```
+
+## MSEEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.MSEEvaluator
+```
+
+## ParaphraseMiningEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.ParaphraseMiningEvaluator
+```
+
+## RerankingEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.RerankingEvaluator
+```
+
+## SentenceEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.SentenceEvaluator
+```
+
+## SequentialEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.SequentialEvaluator
+```
+
+## TranslationEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.TranslationEvaluator
+```
+
+## TripletEvaluator
+```eval_rst
+.. autoclass:: sentence_transformers.evaluation.TripletEvaluator
+```
--- a/docs/package_reference/sentence_transformer/index.rst
+++ b/docs/package_reference/sentence_transformer/index.rst
+
+Sentence Transformer
+====================
+
+.. toctree::
+
+   SentenceTransformer
+   trainer
+   training_args
+   losses
+   evaluation
+   datasets
+   models
+   quantization
\ No newline at end of file
--- a/docs/package_reference/sentence_transformer/losses.md
+++ b/docs/package_reference/sentence_transformer/losses.md
+# Losses
+`sentence_transformers.losses` defines different loss functions that can be used to fine-tune embedding models on training data. The choice of loss function plays a critical role when fine-tuning the model. It determines how well our embedding model will work for the specific downstream task.
+
+Sadly, there is no "one size fits all" loss function. Which loss function is suitable depends on the available training data and on the target task. Consider checking out the [Loss Overview](../../sentence_transformer/loss_overview.html) to help narrow down your choice of loss function(s).
+
+## BatchAllTripletLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.BatchAllTripletLoss
+```
+
+## BatchHardSoftMarginTripletLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.BatchHardSoftMarginTripletLoss
+```
+
+## BatchHardTripletLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.BatchHardTripletLoss
+```
+
+## BatchSemiHardTripletLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.BatchSemiHardTripletLoss
+```
+
+## ContrastiveLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.ContrastiveLoss
+```
+
+## OnlineContrastiveLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.OnlineContrastiveLoss
+```
+
+## ContrastiveTensionLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.ContrastiveTensionLoss
+```
+
+## ContrastiveTensionLossInBatchNegatives
+```eval_rst
+.. autoclass:: sentence_transformers.losses.ContrastiveTensionLossInBatchNegatives
+```
+
+## CoSENTLoss
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.CoSENTLoss
+```
+
+## AnglELoss
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.AnglELoss
+```
+
+## CosineSimilarityLoss
+
+<img src="https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/SBERT_Siamese_Network.png" alt="SBERT Siamese Network Architecture" width="250"/>
+
+For each sentence pair, we pass sentence A and sentence B through our network which yields the embeddings *u* und *v*. The similarity of these embeddings is computed using cosine similarity and the result is compared to the gold similarity score. 
+
+This allows our network to be fine-tuned to recognize the similarity of sentences.
+
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.CosineSimilarityLoss
+```
+
+## DenoisingAutoEncoderLoss
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.DenoisingAutoEncoderLoss
+```
+
+## GISTEmbedLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.GISTEmbedLoss
+```
+
+## CachedGISTEmbedLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.CachedGISTEmbedLoss
+```
+
+## MSELoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.MSELoss
+```
+
+## MarginMSELoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.MarginMSELoss
+```
+
+## MatryoshkaLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.MatryoshkaLoss
+```
+
+## Matryoshka2dLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.Matryoshka2dLoss
+```
+
+## AdaptiveLayerLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.AdaptiveLayerLoss
+```
+
+## MegaBatchMarginLoss
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.MegaBatchMarginLoss
+```
+
+## MultipleNegativesRankingLoss
+
+*MultipleNegativesRankingLoss* is a great loss function if you only have positive pairs, for example, only pairs of similar texts like pairs of paraphrases, pairs of duplicate questions, pairs of (query, response), or pairs of (source_language, target_language).
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.MultipleNegativesRankingLoss
+```
+
+## CachedMultipleNegativesRankingLoss
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.CachedMultipleNegativesRankingLoss
+```
+
+## MultipleNegativesSymmetricRankingLoss
+
+```eval_rst
+.. autoclass:: sentence_transformers.losses.MultipleNegativesSymmetricRankingLoss
+```
+
+## SoftmaxLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.SoftmaxLoss
+```
+
+## TripletLoss
+```eval_rst
+.. autoclass:: sentence_transformers.losses.TripletLoss
+```
--- a/docs/package_reference/sentence_transformer/models.md
+++ b/docs/package_reference/sentence_transformer/models.md
+# Models
+`sentence_transformers.models` defines different building blocks, that can be used to create SentenceTransformer networks from scratch. For more details, see [Training Overview](../training/overview.md).
+
+## Main Classes
+```eval_rst
+.. autoclass:: sentence_transformers.models.Transformer
+.. autoclass:: sentence_transformers.models.Pooling
+.. autoclass:: sentence_transformers.models.Dense
+```
+
+## Further Classes
+```eval_rst
+.. autoclass:: sentence_transformers.models.Asym
+.. autoclass:: sentence_transformers.models.BoW
+.. autoclass:: sentence_transformers.models.CNN
+.. autoclass:: sentence_transformers.models.LSTM
+.. autoclass:: sentence_transformers.models.Normalize
+.. autoclass:: sentence_transformers.models.WeightedLayerPooling
+.. autoclass:: sentence_transformers.models.WordEmbeddings
+.. autoclass:: sentence_transformers.models.WordWeights
+```
--- a/docs/package_reference/sentence_transformer/quantization.md
+++ b/docs/package_reference/sentence_transformer/quantization.md
+# quantization
+`sentence_transformers.quantization` defines different helpful functions to perform embedding quantization. 
+
+```eval_rst
+.. note::
+   `Embedding Quantization <../../../examples/applications/embedding-quantization/README.html>`_ differs from model quantization. The former shrinks the size of embeddings such that semantic search/retrieval is faster and requires less memory and disk space. The latter refers to lowering the precision of the model weights to speed up inference. This page only shows documentation for the former.
+```
+
+```eval_rst
+.. automodule:: sentence_transformers.quantization
+   :members: quantize_embeddings, semantic_search_faiss, semantic_search_usearch
+```
--- a/docs/package_reference/sentence_transformer/trainer.md
+++ b/docs/package_reference/sentence_transformer/trainer.md
+
+# Trainer
+
+## SentenceTransformerTrainer
+
+```eval_rst
+.. autoclass:: sentence_transformers.trainer.SentenceTransformerTrainer
+    :members:
+    :inherited-members:
+    :exclude-members: autocast_smart_context_manager, collect_features, compute_loss_context_manager, evaluation_loop, floating_point_ops, get_decay_parameter_names, get_optimizer_cls_and_kwargs, init_hf_repo, log_metrics, metrics_format, num_examples, num_tokens, predict, prediction_loop, prediction_step, save_metrics, save_model, save_state, training_step
+```
\ No newline at end of file
--- a/docs/package_reference/sentence_transformer/training_args.md
+++ b/docs/package_reference/sentence_transformer/training_args.md
+
+# Training Arguments
+
+## SentenceTransformerTrainingArguments
+```eval_rst
+.. autoclass:: sentence_transformers.training_args.SentenceTransformerTrainingArguments
+    :members:
+    :inherited-members:
+```
+
+## BatchSamplers
+```eval_rst
+.. autoclass:: sentence_transformers.training_args.BatchSamplers
+    :members:
+```
+
+## MultiDatasetBatchSamplers
+```eval_rst
+.. autoclass:: sentence_transformers.training_args.MultiDatasetBatchSamplers
+    :members:
+```
\ No newline at end of file
--- a/docs/package_reference/util.md
+++ b/docs/package_reference/util.md
 # util
 `sentence_transformers.util` defines different helpful functions to work with text embeddings.

+## Helper Functions
 ```eval_rst
 .. automodule:: sentence_transformers.util
-   :members: cos_sim, dot_score, paraphrase_mining, semantic_search, community_detection, http_get, truncate_embeddings
+   :members: paraphrase_mining, semantic_search, community_detection, http_get, truncate_embeddings, normalize_embeddings, is_training_available
+```
+
+## Similarity Metrics
+
+```eval_rst
+.. automodule:: sentence_transformers.util
+   :members: cos_sim, pairwise_cos_sim, dot_score, pairwise_dot_score, manhattan_sim, pairwise_manhattan_sim, euclidean_sim, pairwise_euclidean_sim
 ```
--- a/docs/pretrained-models/msmarco-v1.md
+++ b/docs/pretrained-models/msmarco-v1.md
@@ -6,7 +6,6 @@ The training data constist of over 500k examples, while the complete  corpus con


 ## Version Histroy 
-As we work on the topic, we will publish updated (and improved) models.

 ### v1
 Version 1 models were trained on the training set of MS Marco Passage retrieval task. The models were trained using in-batch negative sampling via the MultipleNegativesRankingLoss with a scaling factor of 20 and a batch size of 128.

--- a/docs/pretrained-models/msmarco-v2.md
+++ b/docs/pretrained-models/msmarco-v2.md
@@ -34,6 +34,5 @@ As baseline we show the results for lexical search with BM25 using Elasticsearch


 ## Version Histroy 
-As we work on the topic, we will publish updated (and improved) models.

 - [Version 1](msmarco-v1.md)
--- a/docs/pretrained-models/msmarco-v3.md
+++ b/docs/pretrained-models/msmarco-v3.md
@@ -58,7 +58,6 @@ If they received a low score by the cross-encoder, we saved them as hard negativ
 We then trained the v2 models with these new hard negatives.

 ## Version Histroy 
-As we work on the topic, we will publish updated (and improved) models.

 - [Version 2](msmarco-v2.md)
 - [Version 1](msmarco-v1.md)
--- a/docs/pretrained-models/msmarco-v5.md
+++ b/docs/pretrained-models/msmarco-v5.md
@@ -65,7 +65,6 @@ If they received a low score by the cross-encoder, we saved them as hard negativ
 We then trained the v2 models with these new hard negatives.

 ## Version Histroy 
-As we work on the topic, we will publish updated (and improved) models.

 - [Version 3](msmarco-v3.md)
 - [Version 2](msmarco-v2.md)

--- a/docs/quickstart.rst
+++ b/docs/quickstart.rst
+Quickstart
+==========
+
+Sentence Transformer
+--------------------
+
+Characteristics of Sentence Transformer (a.k.a bi-encoder) models:
+
+1. Calculates a **fixed-size vector representation (embedding)** given **texts or images**.
+2. Embedding calculation is often **efficient**, embedding similarity calculation is **very fast**.
+3. Applicable for a **wide range of tasks**, such as semantic textual similarity, semantic search, clustering, classification, paraphrase mining, and more.
+4. Often used as a **first step in a two-step retrieval process**, where a Cross-Encoder (a.k.a. reranker) model is used to re-rank the top-k results from the bi-encoder.
+
+Once you have `installed <installation.md>`_ Sentence Transformers, you can easily use Sentence Transformer models:
+
+.. sidebar:: Documentation
+
+   1. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
+   2. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
+   3. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
+
+   **Other useful methods and links:**
+
+   - :meth:`SentenceTransformer.similarity_pairwise <sentence_transformers.SentenceTransformer.similarity_pairwise>`
+   - `SentenceTransformer > Usage <./sentence_transformer/usage/usage.html>`_
+   - `SentenceTransformer > Pretrained Models <./sentence_transformer/pretrained_models.html>`_
+   - `SentenceTransformer > Training Overview <./sentence_transformer/training_overview.html>`_
+   - `SentenceTransformer > Dataset Overview <./sentence_transformer/dataset_overview.html>`_
+   - `SentenceTransformer > Loss Overview <./sentence_transformer/loss_overview.html>`_
+   - `SentenceTransformer > Training Examples <./sentence_transformer/training/examples.html>`_
+
+::
+
+   from sentence_transformers import SentenceTransformer
+
+   # 1. Load a pretrained Sentence Transformer model
+   model = SentenceTransformer("all-MiniLM-L6-v2")
+
+   # The sentences to encode
+   sentences = [
+       "The weather is lovely today.",
+       "It's so sunny outside!",
+       "He drove to the stadium.",
+   ]
+
+   # 2. Calculate embeddings by calling model.encode()
+   embeddings = model.encode(sentences)
+   print(embeddings.shape)
+   # [3, 384]
+
+   # 3. Calculate the embedding similarities
+   similarities = model.similarity(embeddings, embeddings)
+   print(similarities)
+   # tensor([[1.0000, 0.6660, 0.1046],
+   #         [0.6660, 1.0000, 0.1411],
+   #         [0.1046, 0.1411, 1.0000]])
+
+With ``SentenceTransformer("all-MiniLM-L6-v2")`` we pick which `Sentence Transformer model <https://huggingface.co/models?library=sentence-transformers>`_ we load. In this example, we load `all-MiniLM-L6-v2 <https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2>`_, which is a MiniLM model finetuned on a large dataset of over 1 billion training pairs. Using `SentenceTransformer.similarity() <./package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity>`_, we compute the similarity between all pairs of sentences. As expected, the similarity between the first two sentences (0.6660) is higher than the similarity between the first and the third sentence (0.1046) or the second and the third sentence (0.1411).
+
+Finetuning Sentence Transformer models is easy and requires only a few lines of code. For more information, see the `Training Overview <./sentence_transformer/training_overview.html>`_ section.
+
+Cross Encoder
+-------------
+
+Characteristics of Cross Encoder (a.k.a reranker) models:
+
+1. Calculates a **similarity score** given **pairs of texts**.
+2. Generally provides **superior performance** compared to a Sentence Transformer (a.k.a. bi-encoder) model.
+3. Often **slower** than a Sentence Transformer model, as it requires computation for each pair rather than each text.
+4. Due to the previous 2 characteristics, Cross Encoders are often used to **re-rank the top-k results** from a Sentence Transformer model.
+
+The usage for Cross Encoder (a.k.a. reranker) models is similar to Sentence Transformers:
+
+.. sidebar:: Documentation
+
+   1. :class:`CrossEncoder <sentence_transformers.CrossEncoder>`
+   2. :meth:`CrossEncoder.rank <sentence_transformers.CrossEncoder.rank>`
+   3. :meth:`CrossEncoder.predict <sentence_transformers.CrossEncoder.predict>`
+
+   **Other useful methods and links:**
+
+   - `CrossEncoder > Usage <./cross_encoder/usage/usage.html>`_
+   - `CrossEncoder > Pretrained Models <./cross_encoder/pretrained_models.html>`_
+   - `CrossEncoder > Training Overview <./cross_encoder/training_overview.html>`_
+   - `CrossEncoder > Dataset Overview <./cross_encoder/dataset_overview.html>`_
+   - `CrossEncoder > Loss Overview <./cross_encoder/loss_overview.html>`_
+   - `CrossEncoder > Training Examples <./cross_encoder/training/examples.html>`_
+
+::
+
+   from sentence_transformers.cross_encoder import CrossEncoder
+
+   # 1. Load a pretrained CrossEncoder model
+   model = CrossEncoder("cross-encoder/stsb-distilroberta-base")
+
+   # We want to compute the similarity between the query sentence...
+   query = "A man is eating pasta."
+
+   # ... and all sentences in the corpus
+   corpus = [
+       "A man is eating food.",
+       "A man is eating a piece of bread.",
+       "The girl is carrying a baby.",
+       "A man is riding a horse.",
+       "A woman is playing violin.",
+       "Two men pushed carts through the woods.",
+       "A man is riding a white horse on an enclosed ground.",
+       "A monkey is playing drums.",
+       "A cheetah is running behind its prey.",
+   ]
+
+   # 2. We rank all sentences in the corpus for the query
+   ranks = model.rank(query, corpus)
+
+   # Print the scores
+   print("Query: ", query)
+   for rank in ranks:
+       print(f"{rank['score']:.2f}\t{corpus[rank['corpus_id']]}")
+   """
+   Query:  A man is eating pasta.
+   0.67    A man is eating food.
+   0.34    A man is eating a piece of bread.
+   0.08    A man is riding a horse.
+   0.07    A man is riding a white horse on an enclosed ground.
+   0.01    The girl is carrying a baby.
+   0.01    Two men pushed carts through the woods.
+   0.01    A monkey is playing drums.
+   0.01    A woman is playing violin.
+   0.01    A cheetah is running behind its prey.
+   """
+
+   # 3. Alternatively, you can also manually compute the score between two sentences
+   import numpy as np
+
+   sentence_combinations = [[query, sentence] for sentence in corpus]
+   scores = model.predict(sentence_combinations)
+
+   # Sort the scores in decreasing order to get the corpus indices
+   ranked_indices = np.argsort(scores)[::-1]
+   print("Scores:", scores)
+   print("Indices:", ranked_indices)
+   """
+   Scores: [0.6732372, 0.34102544, 0.00542465, 0.07569341, 0.00525378, 0.00536814, 0.06676237, 0.00534825, 0.00516717]
+   Indices: [0 1 3 6 2 5 7 4 8]
+   """
+
+With ``CrossEncoder("cross-encoder/stsb-distilroberta-base")`` we pick which `CrossEncoder model <./cross_encoder/pretrained_models.html>`_ we load. In this example, we load `cross-encoder/stsb-distilroberta-base <https://huggingface.co/cross-encoder/stsb-distilroberta-base>`_, which is a `DistilRoBERTa <https://huggingface.co/distilbert/distilroberta-base>`_ model finetuned on the `STS Benchmark <https://huggingface.co/datasets/sentence-transformers/stsb>`_ dataset.
+
+Next Steps
+----------
+
+Consider reading one of the following sections next:
+
+* `Sentence Transformers > Usage <./sentence_transformer/usage/usage.html>`_
+* `Sentence Transformers > Pretrained Models <./sentence_transformer/pretrained_models.html>`_
+* `Sentence Transformers > Training Overview <./sentence_transformer/training_overview.html>`_
+* `Sentence Transformers > Training Examples > Multilingual Models <../examples/training/multilingual/README.html>`_
+* `Cross Encoder > Usage <./cross_encoder/usage/usage.html>`_
+* `Cross Encoder > Pretrained Models <./cross_encoder/pretrained_models.html>`_
+
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
 # Must use Python 3.8!

-sphinx<4
-Jinja2<3.1
-sphinx_markdown_tables
-recommonmark
+sphinx==3.5.4
+Jinja2==3.0.3
+sphinx_markdown_tables==0.0.17
+recommonmark==0.7.1
+sphinx-copybutton==0.5.2
+sphinx_inline_tabs==2023.4.21
 -e ..
\ No newline at end of file
--- a/docs/sentence_transformer/dataset_overview.md
+++ b/docs/sentence_transformer/dataset_overview.md
+# Dataset Overview
+
+```eval_rst
+.. hint::
+
+   **Quickstart:** Find `curated datasets <https://huggingface.co/collections/sentence-transformers/embedding-model-datasets-6644d7a3673a511914aa7552>`_ or `community datasets <https://huggingface.co/datasets?other=sentence-transformers>`_, choose a loss function via this `loss overview <loss_overview.html>`_, and `verify <training_overview.html#dataset-format>`_ that it works with your dataset.
+```
+
+It is important that your dataset format matches your loss function (or that you choose a loss function that matches your dataset format). See [Training Overview > Dataset Format](./training_overview.html#dataset-format) to learn how to verify whether a dataset format works with a loss function.
+
+In practice, most dataset configurations will take one of four forms:
+
+- **Positive Pair**: A pair of related sentences. This can be used both for symmetric tasks (semantic textual similarity) or assymetric tasks (semantic search), with examples including pairs of paraphrases, pairs of full texts and their summaries, pairs of duplicate questions, pairs of (`query`, `response`), or pairs of (`source_language`, `target_language`). Natural Language Inference datasets can also be formatted this way by pairing entailing sentences.
+   - **Examples:** [sentence-transformers/sentence-compression](https://huggingface.co/datasets/sentence-transformers/sentence-compression), [sentence-transformers/coco-captions](https://huggingface.co/datasets/sentence-transformers/coco-captions), [sentence-transformers/codesearchnet](https://huggingface.co/datasets/sentence-transformers/codesearchnet), [sentence-transformers/natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions), [sentence-transformers/gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq), [sentence-transformers/squad](https://huggingface.co/datasets/sentence-transformers/squad), [sentence-transformers/wikihow](https://huggingface.co/datasets/sentence-transformers/wikihow), [sentence-transformers/eli5](https://huggingface.co/datasets/sentence-transformers/eli5)
+- **Triplets**: (anchor, positive, negative) text triplets. These datasets don't need labels.
+   - **Examples:** [sentence-transformers/quora-duplicates](https://huggingface.co/datasets/sentence-transformers/quora-duplicates), [nirantk/triplets](https://huggingface.co/datasets/nirantk/triplets), [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
+- **Pair with Similarity Score**: A pair of sentences with a score indicating their similarity. Common examples are "Semantic Textual Similarity" datasets.
+   - **Examples:** [sentence-transformers/stsb](https://huggingface.co/datasets/sentence-transformers/stsb), [PhilipMay/stsb_multi_mt](https://huggingface.co/datasets/PhilipMay/stsb_multi_mt).
+- **Texts with Classes**: A text with its corresponding class. This data format is easily converted by loss functions into three sentences (triplets) where the first is an "anchor", the second a "positive" of the same class as the anchor, and the third a "negative" of a different class.
+   - **Examples:** [trec](https://huggingface.co/datasets/trec), [yahoo_answers_topics](https://huggingface.co/datasets/yahoo_answers_topics).
+
+Note that it is often simple to transform a dataset from one format to another, such that it works with your loss function of choice.
+
+## Datasets on the Hugging Face Hub
+
+```eval_rst
+The `Datasets library <https://huggingface.co/docs/datasets/index>`_ (``pip install datasets``) allows you to load datasets from the Hugging Face Hub with the :func:`~datasets.load_dataset` function::
+
+   from datasets import load_dataset
+
+   # Indicate the dataset id from the Hub
+   dataset_id = "sentence-transformers/natural-questions"
+   dataset = load_dataset(dataset_id, split="train")
+   """
+   Dataset({
+      features: ['query', 'answer'],
+      num_rows: 100231
+   })
+   """
+   print(dataset[0])
+   """
+   {
+      'query': 'when did richmond last play in a preliminary final',
+      'answer': "Richmond Football Club Richmond began 2017 with 5 straight wins, a feat it had not achieved since 1995. A series of close losses hampered the Tigers throughout the middle of the season, including a 5-point loss to the Western Bulldogs, 2-point loss to Fremantle, and a 3-point loss to the Giants. Richmond ended the season strongly with convincing victories over Fremantle and St Kilda in the final two rounds, elevating the club to 3rd on the ladder. Richmond's first final of the season against the Cats at the MCG attracted a record qualifying final crowd of 95,028; the Tigers won by 51 points. Having advanced to the first preliminary finals for the first time since 2001, Richmond defeated Greater Western Sydney by 36 points in front of a crowd of 94,258 to progress to the Grand Final against Adelaide, their first Grand Final appearance since 1982. The attendance was 100,021, the largest crowd to a grand final since 1986. The Crows led at quarter time and led by as many as 13, but the Tigers took over the game as it progressed and scored seven straight goals at one point. They eventually would win by 48 points – 16.12 (108) to Adelaide's 8.12 (60) – to end their 37-year flag drought.[22] Dustin Martin also became the first player to win a Premiership medal, the Brownlow Medal and the Norm Smith Medal in the same season, while Damien Hardwick was named AFL Coaches Association Coach of the Year. Richmond's jump from 13th to premiers also marked the biggest jump from one AFL season to the next."
+   }
+   """
+```
+
+For more information on how to manipulate your dataset see the [Datasets Documentation](https://huggingface.co/docs/datasets/access).
+
+```eval_rst
+.. tip::
+   
+   It's common for Hugging Face Datasets to contain extraneous columns, e.g. sample_id, metadata, source, type, etc. You can use :meth:`Dataset.remove_columns <datasets.Dataset.remove_columns>` to remove these columns, as they will be used as inputs otherwise. You can also use :meth:`Dataset.select_columns <datasets.Dataset.select_columns>` to keep only the desired columns.
+```
+
+## Pre-existing Datasets
+
+The [Hugging Face Hub](https://huggingface.co/datasets) hosts 150k+ datasets, many of which can be converted for training embedding models. 
+We are aiming to tag all Hugging Face datasets that work out of the box with Sentence Transformers with `sentence-transformers`, allowing you to easily find them by browsing to [https://huggingface.co/datasets?other=sentence-transformers](https://huggingface.co/datasets?other=sentence-transformers). We strongly recommend that you browse these datasets to find training datasets that might be useful for your tasks.
+
+These are some of the popular pre-existing datasets tagged as ``sentence-transformers`` that can be used to train and fine-tune SentenceTransformer models:
+
+| Dataset                                                                                                                                                                | Description                                                                                               |
+|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
+| [GooAQ](https://huggingface.co/datasets/sentence-transformers/gooaq)                                                                                                   | (Question, Answer) pairs from Google auto suggest                                                         |
+| [Yahoo Answers](https://huggingface.co/datasets/sentence-transformers/yahoo-answers)                                                                                   | (Title+Question, Answer), (Title, Answer), (Title, Question), (Question, Answer) pairs from Yahoo Answers |
+| [MS MARCO Triplets (msmarco-distilbert-base-tas-b)](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-tas-b)                       | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (msmarco-distilbert-base-v3)](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-distilbert-base-v3)                             | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (msmarco-MiniLM-L-6-v3)](https://huggingface.co/datasets/sentence-transformers/msmarco-msmarco-MiniLM-L-6-v3)                                       | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (distilbert-margin-mse-cls-dot-v2)](https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-cls-dot-v2)                 | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (distilbert-margin-mse-cls-dot-v1)](https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-cls-dot-v1)                 | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (distilbert-margin-mse-mean-dot-v1)](https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-mean-dot-v1)               | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (mpnet-margin-mse-mean-v1)](https://huggingface.co/datasets/sentence-transformers/msmarco-mpnet-margin-mse-mean-v1)                                 | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (co-condenser-margin-mse-cls-v1)](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-cls-v1)                     | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (distilbert-margin-mse-mnrl-mean-v1)](https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-mnrl-mean-v1)             | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (distilbert-margin-mse-sym-mnrl-mean-v1)](https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-sym-mnrl-mean-v1)     | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (distilbert-margin-mse-sym-mnrl-mean-v2)](https://huggingface.co/datasets/sentence-transformers/msmarco-distilbert-margin-mse-sym-mnrl-mean-v2)     | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (co-condenser-margin-mse-sym-mnrl-mean-v1)](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1) | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [MS MARCO Triplets (BM25)](https://huggingface.co/datasets/sentence-transformers/msmarco-bm25)                                                                         | (Question, Answer, Negative) triplets from MS MARCO Passages dataset with mined negatives                 |
+| [Stack Exchange Duplicates](https://huggingface.co/datasets/sentence-transformers/stackexchange-duplicates)                                                            | (Title, Title), (Title+Body, Title+Body), (Body, Body) pairs of duplicate questions from StackExchange    |
+| [ELI5](https://huggingface.co/datasets/sentence-transformers/eli5)                                                                                                     | (Question, Answer) pairs from ELI5 dataset                                                                |
+| [SQuAD](https://huggingface.co/datasets/sentence-transformers/squad)                                                                                                   | (Question, Answer) pairs from SQuAD dataset                                                               |
+| [WikiHow](https://huggingface.co/datasets/sentence-transformers/wikihow)                                                                                               | (Summary, Text) pairs from WikiHow                                                                        |
+| [Amazon Reviews 2018](https://huggingface.co/datasets/sentence-transformers/amazon-reviews)                                                                            | (Title, review) pairs from Amazon Reviews                                                                 |
+| [Natural Questions](https://huggingface.co/datasets/sentence-transformers/natural-questions)                                                                           | (Query, Answer) pairs from the Natural Questions dataset                                                  |
+| [Amazon QA](https://huggingface.co/datasets/sentence-transformers/amazon-qa)                                                                                           | (Question, Answer) pairs from Amazon                                                                      |
+| [S2ORC](https://huggingface.co/datasets/sentence-transformers/s2orc)                                                                                                   | (Title, Abstract), (Abstract, Citation), (Title, Citation) pairs of scientific papers                     |
+| [Quora Duplicates](https://huggingface.co/datasets/sentence-transformers/quora-duplicates)                                                                             | Duplicate question pairs from Quora                                                                       |
+| [WikiAnswers](https://huggingface.co/datasets/sentence-transformers/wikianswers-duplicates)                                                                            | Duplicate question pairs from WikiAnswers                                                                 |
+| [AGNews](https://huggingface.co/datasets/sentence-transformers/agnews)                                                                                                 | (Title, Description) pairs of news articles from the AG News dataset                                      |
+| [AllNLI](https://huggingface.co/datasets/sentence-transformers/all-nli)                                                                                                | (Anchor, Entailment, Contradiction) triplets from SNLI + MultiNLI                                         |
+| [NPR](https://huggingface.co/datasets/sentence-transformers/npr)                                                                                                       | (Title, Body) pairs from the npr.org website                                                              |
+| [SPECTER](https://huggingface.co/datasets/sentence-transformers/specter)                                                                                               | (Title, Positive Title, Negative Title) triplets of Scientific Publications from Specter                  |
+| [Simple Wiki](https://huggingface.co/datasets/sentence-transformers/simple-wiki)                                                                                       | (English, Simple English) pairs from Wikipedia                                                            |
+| [PAQ](https://huggingface.co/datasets/sentence-transformers/paq)                                                                                                       | (Query, Answer) from the Probably-Asked Questions dataset                                                 |
+| [altlex](https://huggingface.co/datasets/sentence-transformers/altlex)                                                                                                 | (English, Simple English) pairs from Wikipedia                                                            |
+| [CC News](https://huggingface.co/datasets/sentence-transformers/ccnews)                                                                                                | (Title, article) pairs from the CC News dataset                                                           |
+| [CodeSearchNet](https://huggingface.co/datasets/sentence-transformers/codesearchnet)                                                                                   | (Comment, Code) pairs from open source libraries on GitHub                                                |
+| [Sentence Compression](https://huggingface.co/datasets/sentence-transformers/sentence-compression)                                                                     | (Long text, Short text) pairs from the Sentence Compression dataset                                       |
+| [Trivia QA](https://huggingface.co/datasets/sentence-transformers/trivia-qa)                                                                                           | (Query, Answer) pairs from the TriviaQA dataset                                                           |
+| [Flickr30k Captions](https://huggingface.co/datasets/sentence-transformers/flickr30k-captions)                                                                         | Duplicate captions from the Flickr30k dataset                                                             |
+| [xsum](https://huggingface.co/datasets/sentence-transformers/xsum)                                                                                                     | (News Article, Summary) pairs from XSUM dataset                                                           |
+| [Coco Captions](https://huggingface.co/datasets/sentence-transformers/coco-captions)                                                                                   | Duplicate captions from the Coco Captions dataset                                                         |
+| [Parallel Sentences: Europarl](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-europarl)                                                      | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: Global Voices](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-global-voices)                                            | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: MUSE](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-muse)                                                              | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: JW300](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-jw300)                                                            | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: News Commentary](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-news-commentary)                                        | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: OpenSubtitles](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-opensubtitles)                                            | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: Talks](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks)                                                            | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: Tatoeba](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-tatoeba)                                                        | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: WikiMatrix](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-wikimatrix)                                                  | (English, Non-English) pairs across numerous languages                                                    |
+| [Parallel Sentences: WikiTitles](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-wikititles)                                                  | (English, Non-English) pairs across numerous languages                                                    |
+
+```eval_rst
+
+.. note::
+
+   We advise users to tag datasets that can be used for training embedding models with ``sentence-transformers`` by adding ``tags: sentence-transformers``. We would also gladly accept high quality datasets to be added to the list above for all to see and use.
+```
\ No newline at end of file
--- a/docs/sentence_transformer/loss_overview.md
+++ b/docs/sentence_transformer/loss_overview.md
+# Loss Overview
+
+Loss functions play a critical role in the performance of your fine-tuned model. Sadly, there is no "one size fits all" loss function. Ideally, this table should help narrow down your choice of loss function(s) by matching them to your data formats.
+
+```eval_rst
+.. note:: 
+
+    You can often convert one training data format into another, allowing more loss functions to be viable for your scenario. For example, ``(sentence_A, sentence_B) pairs`` with ``class`` labels can be converted into ``(anchor, positive, negative) triplets`` by sampling sentences with the same or different classes.
+```
+
+| Inputs                                        | Labels                         | Appropriate Loss Functions                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+|-----------------------------------------------|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `single sentences`                            | `class`                        | <a href="../package_reference/sentence_transformer/losses.html#batchalltripletloss">`BatchAllTripletLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#batchhardsoftmargintripletloss">`BatchHardSoftMarginTripletLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#batchhardtripletloss">`BatchHardTripletLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#batchsemihardtripletloss">`BatchSemiHardTripletLoss`</a>                                                                                                                                                                                                                                                                         |
+| `single sentences`                            | `none`                         | <a href="../package_reference/sentence_transformer/losses.html#contrastivetensionloss">`ContrastiveTensionLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#denoisingautoencoderloss">`DenoisingAutoEncoderLoss`</a>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| `(anchor, anchor) pairs`                      | `none`                         | <a href="../package_reference/sentence_transformer/losses.html#contrastivetensionlossinbatchnegatives">`ContrastiveTensionLossInBatchNegatives`</a>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| `(damaged_sentence, original_sentence) pairs` | `none`                         | <a href="../package_reference/sentence_transformer/losses.html#denoisingautoencoderloss">`DenoisingAutoEncoderLoss`</a>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| `(sentence_A, sentence_B) pairs`              | `class`                        | <a href="../package_reference/sentence_transformer/losses.html#softmaxloss">`SoftmaxLoss`</a>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+| `(anchor, positive) pairs`                    | `none`                         | <a href="../package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss">`CachedMultipleNegativesRankingLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss">`MultipleNegativesRankingLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss">`MultipleNegativesSymmetricRankingLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#megabatchmarginloss">`MegaBatchMarginLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#cachedgistembedloss">`CachedGISTEmbedLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#gistembedloss">`GISTEmbedLoss`</a> |
+| `(anchor, positive/negative) pairs`           | `1 if positive, 0 if negative` | <a href="../package_reference/sentence_transformer/losses.html#contrastiveloss">`ContrastiveLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#onlinecontrastiveloss">`OnlineContrastiveLoss`</a>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| `(sentence_A, sentence_B) pairs`              | `float similarity score`       | <a href="../package_reference/sentence_transformer/losses.html#cosentloss">`CoSENTLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#angleloss">`AnglELoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#cosinesimilarityloss">`CosineSimilarityLoss`</a>                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+| `(anchor, positive, negative) triplets`       | `none`                         | <a href="../package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss">`CachedMultipleNegativesRankingLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss">`MultipleNegativesRankingLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#tripletloss">`TripletLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#cachedgistembedloss">`CachedGISTEmbedLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#gistembedloss">`GISTEmbedLoss`</a>                                                                                                                                                                      |
+
+## Loss modifiers
+
+These loss functions can be seen as *loss modifiers*: they work on top of standard loss functions, but apply those loss functions in different ways to try and instil useful properties into the trained embedding model.
+
+For example, models trained with <a href="../package_reference/sentence_transformer/losses.html#matryoshkaloss">`MatryoshkaLoss`</a> produce embeddings whose size can be truncated without notable losses in performance, and models trained with <a href="../package_reference/sentence_transformer/losses.html#adaptivelayerloss">`AdaptiveLayerLoss`</a> still perform well when you remove model layers for faster inference.
+
+| Texts | Labels | Appropriate Loss Functions                                                                                                                                                                                                                                                                                                  |
+|-------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `any` | `any`  | <a href="../package_reference/sentence_transformer/losses.html#matryoshkaloss">`MatryoshkaLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#adaptivelayerloss">`AdaptiveLayerLoss`</a><br><a href="../package_reference/sentence_transformer/losses.html#matryoshka2dloss">`Matryoshka2dLoss`</a> |
+
+## Distillation
+These loss functions are specifically designed to be used when distilling the knowledge from one model into another.
+For example, when finetuning a small model to behave more like a larger & stronger one, or when finetuning a model to become multi-lingual.
+
+| Texts                                        | Labels                                                        | Appropriate Loss Functions                                                                        |
+|----------------------------------------------|---------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
+| `sentence`                                   | `model sentence embeddings`                                   | <a href="../package_reference/sentence_transformer/losses.html#mseloss">`MSELoss`</a>             |
+| `sentence_1, sentence_2, ..., sentence_N`    | `model sentence embeddings`                                   | <a href="../package_reference/sentence_transformer/losses.html#mseloss">`MSELoss`</a>             |
+| `(query, passage_one, passage_two) triplets` | `gold_sim(query, passage_one) - gold_sim(query, passage_two)` | <a href="../package_reference/sentence_transformer/losses.html#marginmseloss">`MarginMSELoss`</a> |
+
+## Commonly used Loss Functions
+In practice, not all loss functions get used equally often. The most common scenarios are:
+
+* `(anchor, positive) pairs` without any labels: <a href="../package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss"><code>MultipleNegativesRankingLoss</code></a> is commonly used to train the top performing embedding models. This data is often relatively cheap to obtain, and the models are generally very performant. <a href="../package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss"><code>CachedMultipleNegativesRankingLoss</code></a> is often used to increase the batch size, resulting in superior performance.
+* `(sentence_A, sentence_B) pairs` with a `float similarity score`: <a href="../package_reference/sentence_transformer/losses.html#cosinesimilarityloss"><code>CosineSimilarityLoss</code></a> is traditionally used a lot, though more recently <a href="../package_reference/sentence_transformer/losses.html#cosentloss"><code>CoSENTLoss</code></a> and <a href="../package_reference/sentence_transformer/losses.html#angleloss"><code>AnglELoss</code></a> are used as drop-in replacements with superior performance.
+
+## Custom Loss Functions
+
+```eval_rst
+Advanced users can create and train with their own loss functions. Custom loss functions only have a few requirements:
+
+- They must be a subclass of :class:`torch.nn.Module`.
+- They must have ``model`` as the first argument in the constructor.
+- They must implement a ``forward`` method that accepts ``sentence_features`` and ``labels``. The former is a list of tokenized batches, one element for each column. These tokenized batches can be fed directly to the ``model`` being trained to produce embeddings. The latter is an optional tensor of labels. The method must return a single loss value.
+
+To get full support with the automatic model card generation, you may also wish to implement:
+
+- a ``get_config_dict`` method that returns a dictionary of loss parameters.
+- a ``citation`` property so your work gets cited in all models that train with the loss.
+```
\ No newline at end of file
--- a/docs/sentence_transformer/pretrained_models.md
+++ b/docs/sentence_transformer/pretrained_models.md
+# Pretrained Models
+
+We provide various pre-trained Sentence Transformers models via our Sentence Transformers Hugging Face organization. Additionally, over 6,000 community Sentence Transformers models have been publicly released on the Hugging Face Hub. All models can be found here:
+* **Original models**: [Sentence Transformers Hugging Face organization](https://huggingface.co/models?library=sentence-transformers&author=sentence-transformers).
+* **Community models**: [All Sentence Transformer models on Hugging Face](https://huggingface.co/models?library=sentence-transformers).
+
+Each of these models can be easily downloaded and used like so:
+
+```eval_rst
+.. sidebar:: Original Models
+
+    For the original models from the `Sentence Transformers Hugging Face organization <https://huggingface.co/models?library=sentence-transformers&author=sentence-transformers>`_, it is not necessary to include the model author or organization prefix. For example, this snippet loads `sentence-transformers/all-mpnet-base-v2 <https://huggingface.co/sentence-transformers/all-mpnet-base-v2>`_.
+```
+
+```python
+from sentence_transformers import SentenceTransformer
+
+# Load https://huggingface.co/sentence-transformers/all-mpnet-base-v2
+model = SentenceTransformer("all-mpnet-base-v2")
+embeddings = model.encode([
+    "The weather is lovely today.",
+    "It's so sunny outside!",
+    "He drove to the stadium.",
+])
+similarities = model.similarity(embeddings, embeddings)
+```
+
+```eval_rst
+.. note::
+    Consider using the `Massive Textual Embedding Benchmark leaderboard <https://huggingface.co/spaces/mteb/leaderboard>`_ as an inspiration of strong Sentence Transformer models. Be wary:
+
+    - **Model sizes**: it is recommended to filter away the large models that might not be feasible without excessive hardware.
+    - **Experimentation is key**: models that perform well on the leaderboard do not necessarily do well on your tasks, it is **crucial** to experiment with various promising models.
+```
+
+## Original Models
+
+The following table provides an overview of a selection of our models. They have been extensively evaluated for their quality to embedded sentences (Performance Sentence Embeddings) and to embedded search queries & paragraphs (Performance Semantic Search).
+
+The **all-*** models were trained on all available training data (more than 1 billion training pairs) and are designed as **general purpose** models. The [**all-mpnet-base-v2**](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) model provides the best quality, while [**all-MiniLM-L6-v2**](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) is 5 times faster and still offers good quality. Toggle *All models* to see all evaluated original models.
+
+
+<iframe src="../../../_static/html/models_en_sentence_embeddings.html" height="600" style="width:100%; border:none;" title="Iframe Example"></iframe>
+
+---
+
+## Semantic Search Models
+
+The following models have been specifically trained for **Semantic Search**: Given a question / search query, these models are able to find relevant text passages. For more details, see [Usage > Semantic Search](../../examples/applications/semantic-search/README.md).
+
+```eval_rst
+.. sidebar:: Documentation
+
+   #. `multi-qa-mpnet-base-cos-v1 <https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1>`_
+   #. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
+   #. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
+   #. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
+
+```
+
+```python
+from sentence_transformers import SentenceTransformer
+
+model = SentenceTransformer("multi-qa-mpnet-base-cos-v1")
+
+query_embedding = model.encode("How big is London")
+passage_embeddings = model.encode([
+    "London is known for its finacial district",
+    "London has 9,787,426 inhabitants at the 2011 census",
+    "The United Kingdom is the fourth largest exporter of goods in the world",
+])
+
+similarity = model.similarity(query_embedding, passage_embeddings)
+# => tensor([[0.4659, 0.6142, 0.2697]])
+```
+
+
+
+### Multi-QA Models
+
+The following models have been trained on [215M question-answer pairs](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-dot-v1#training) from various sources and domains, including StackExchange, Yahoo Answers, Google & Bing search queries and many more. These model perform well across many search tasks and domains.
+
+These models were tuned to be used with the dot-product similarity score:
+
+| Model | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. | 
+| --- | :---: | :---: |
+| [multi-qa-mpnet-base-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-dot-v1) | 57.60 | 4,000 / 170 |
+| [multi-qa-distilbert-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-distilbert-dot-v1) | 52.51  | 7,000 / 350 |
+| [multi-qa-MiniLM-L6-dot-v1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-dot-v1) | 49.19 | 18,000 / 750 |
+
+These models produce normalized vectors of length 1, which can be used with dot-product, cosine-similarity and Euclidean distance as the similarity functions:
+
+| Model | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. | 
+| --- | :---: | :---: |
+| [multi-qa-mpnet-base-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-mpnet-base-cos-v1) | 57.46 | 4,000 / 170 |
+| [multi-qa-distilbert-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-distilbert-cos-v1) |  52.83 | 7,000 / 350 |
+| [multi-qa-MiniLM-L6-cos-v1](https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1) | 51.83 | 18,000 / 750 |
+
+### MSMARCO Passage Models
+
+The following models have been trained on the [MSMARCO Passage Ranking Dataset](https://github.com/microsoft/MSMARCO-Passage-Ranking), which contains 500k real queries from Bing search together with the relevant passages from various web sources. Given the diversity of the MSMARCO dataset, models also perform well on other domains. 
+
+These models were tuned to be used with the dot-product similarity score:
+
+| Model | MSMARCO MRR@10 dev set | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. | 
+| --- | :---: | :---: | :---: |
+| [msmarco-bert-base-dot-v5](https://huggingface.co/sentence-transformers/msmarco-bert-base-dot-v5) | 38.08 | 52.11 | 4,000 / 170 |
+| [msmarco-distilbert-dot-v5](https://huggingface.co/sentence-transformers/msmarco-distilbert-dot-v5) | 37.25 | 49.47 | 7,000 / 350 |
+| [msmarco-distilbert-base-tas-b](https://huggingface.co/sentence-transformers/msmarco-distilbert-base-tas-b) | 34.43 | 49.25 | 7,000 / 350 |
+
+These models produce normalized vectors of length 1, which can be used with dot-product, cosine-similarity and Euclidean distance as the similarity functions:
+
+| Model | MSMARCO MRR@10 dev set | Performance Semantic Search (6 Datasets) | Queries (GPU / CPU) per sec. | 
+| --- | :---: | :---: | :---: |
+| [msmarco-distilbert-cos-v5](https://huggingface.co/sentence-transformers/msmarco-distilbert-cos-v5) | 33.79 | 44.98 | 7,000 / 350 |
+| [msmarco-MiniLM-L12-cos-v5](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L12-cos-v5) | 32.75 | 43.89 | 11,000 / 400 |
+| [msmarco-MiniLM-L6-cos-v5](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L6-cos-v5) | 32.27 | 42.16 | 18,000 / 750 |
+
+[MSMARCO Models - More details](../pretrained-models/msmarco-v5.md)
+
+---
+
+## Multilingual Models
+The following models similar embeddings for the same texts in different languages. You do not need to specify the input language. Details are in our publication [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813). We used the following 50+ languages: ar, bg, ca, cs, da, de, el, en, es, et, fa, fi, fr, fr-ca, gl, gu, he, hi, hr, hu, hy, id, it, ja, ka, ko, ku, lt, lv, mk, mn, mr, ms, my, nb, nl, pl, pt, pt-br, ro, ru, sk, sl, sq, sr, sv, th, tr, uk, ur, vi, zh-cn, zh-tw. 
+
+### Semantic Similarity Models
+
+These models find semantically similar sentences within one language or across languages:
+
+- **[distiluse-base-multilingual-cased-v1](https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v1)**: Multilingual knowledge distilled version of [multilingual Universal Sentence Encoder](https://arxiv.org/abs/1907.04307). Supports 15 languages:  Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish. 
+- **[distiluse-base-multilingual-cased-v2](https://huggingface.co/sentence-transformers/distiluse-base-multilingual-cased-v2)**: Multilingual knowledge distilled version of [multilingual Universal Sentence Encoder](https://arxiv.org/abs/1907.04307). This version supports 50+ languages, but performs a bit weaker than the v1 model.
+- **[paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)** - Multilingual version of [paraphrase-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L12-v2), trained on parallel data for 50+ languages. 
+- **[paraphrase-multilingual-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-mpnet-base-v2)** - Multilingual version of [paraphrase-mpnet-base-v2](https://huggingface.co/sentence-transformers/paraphrase-mpnet-base-v2), trained on parallel data for 50+ languages. 
+
+### Bitext Mining
+
+Bitext mining describes the process of finding translated sentence pairs in two languages. If this is your use-case, the following model gives the best performance:
+- **[LaBSE](https://huggingface.co/sentence-transformers/LaBSE)** - [LaBSE](https://arxiv.org/abs/2007.01852) Model. Supports 109 languages. Works well for finding translation pairs in multiple languages. As detailed  [here](https://arxiv.org/abs/2004.09813), LaBSE works less well for assessing the similarity of sentence pairs that are not translations of each other.
+
+Extending a model to new languages is easy by following [Training Examples > Multilingual Models](../../examples/training/multilingual/README.html).
+
+## Image & Text-Models
+The following models can embed images and text into a joint vector space. See [Usage > Image Search](../../examples/applications/image-search/README.md) for more details how to use for text2image-search, image2image-search, image clustering, and zero-shot image classification.
+
+The following models are available with their respective Top 1 accuracy on zero-shot ImageNet validation dataset.
+
+| Model | Top 1 Performance |
+| --- | :---: |
+| [clip-ViT-L-14](https://huggingface.co/sentence-transformers/clip-ViT-L-14) | 75.4 |
+| [clip-ViT-B-16](https://huggingface.co/sentence-transformers/clip-ViT-B-16) | 68.1 |
+| [clip-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) | 63.3 |
+
+We further provide this multilingual text-image model:
+- **[clip-ViT-B-32-multilingual-v1](https://huggingface.co/sentence-transformers/clip-ViT-B-32-multilingual-v1)** - Multilingual text encoder for the [clip-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) model using [Multilingual Knowledge Distillation](https://arxiv.org/abs/2004.09813). This model can encode text in 50+ languages to match the image vectors from the [clip-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) model.
+
+## INSTRUCTOR models
+Some INSTRUCTOR models, such as [hkunlp/instructor-large](https://huggingface.co/hkunlp/instructor-large), are natively supported in Sentence Transformers. These models are special, as they are trained with instructions in mind. Notably, the primary difference between normal Sentence Transformer models and Instructor models is that the latter do not include the instructions themselves in the pooling step.
+
+The following models work out of the box:
+* [hkunlp/instructor-base](https://huggingface.co/hkunlp/instructor-base)
+* [hkunlp/instructor-large](https://huggingface.co/hkunlp/instructor-large)
+* [hkunlp/instructor-xl](https://huggingface.co/hkunlp/instructor-xl)
+
+You can use these models like so:
+
+```python
+from sentence_transformers import SentenceTransformer
+
+model = SentenceTransformer("hkunlp/instructor-large")
+embeddings = model.encode(
+    [
+        "Dynamical Scalar Degree of Freedom in Horava-Lifshitz Gravity",
+        "Comparison of Atmospheric Neutrino Flux Calculations at Low Energies",
+        "Fermion Bags in the Massive Gross-Neveu Model",
+        "QCD corrections to Associated t-tbar-H production at the Tevatron",
+    ],
+    prompt="Represent the Medicine sentence for clustering: ",
+)
+print(embeddings.shape)
+# => (4, 768)
+```
+
+For example, for information retrieval:
+```python
+from sentence_transformers import SentenceTransformer
+from sentence_transformers.util import cos_sim
+
+model = SentenceTransformer("hkunlp/instructor-large")
+query = "where is the food stored in a yam plant"
+query_instruction = (
+    "Represent the Wikipedia question for retrieving supporting documents: "
+)
+corpus = [
+    'Yams are perennial herbaceous vines native to Africa, Asia, and the Americas and cultivated for the consumption of their starchy tubers in many temperate and tropical regions. The tubers themselves, also called "yams", come in a variety of forms owing to numerous cultivars and related species.',
+    "The disparate impact theory is especially controversial under the Fair Housing Act because the Act regulates many activities relating to housing, insurance, and mortgage loansâ€”and some scholars have argued that the theory's use under the Fair Housing Act, combined with extensions of the Community Reinvestment Act, contributed to rise of sub-prime lending and the crash of the U.S. housing market and ensuing global economic recession",
+    "Disparate impact in United States labor law refers to practices in employment, housing, and other areas that adversely affect one group of people of a protected characteristic more than another, even though rules applied by employers or landlords are formally neutral. Although the protected classes vary by statute, most federal civil rights laws protect based on race, color, religion, national origin, and sex as protected traits, and some laws include disability status and other traits as well.",
+]
+corpus_instruction = "Represent the Wikipedia document for retrieval: "
+
+query_embedding = model.encode(query, prompt=query_instruction)
+corpus_embeddings = model.encode(corpus, prompt=corpus_instruction)
+similarities = cos_sim(query_embedding, corpus_embeddings)
+print(similarities)
+# => tensor([[0.8835, 0.7037, 0.6970]])
+```
+
+All other Instructor models either 1) will not load as they refer to `InstructorEmbedding` in their `modules.json` or 2) require calling `model.set_pooling_include_prompt(include_prompt=False)` after loading.
+
+## Scientific Similarity Models
+[SPECTER](https://arxiv.org/abs/2004.07180) is a model trained on scientific citations and can be used to estimate the similarity of two publications. We can use it to find similar papers.
+
+- **[allenai-specter](https://huggingface.co/sentence-transformers/allenai-specter)** - [Semantic Search Python Example](../../examples/applications/semantic-search/semantic_search_publications.py) / [Semantic Search Colab Example](https://colab.research.google.com/drive/12hfBveGHRsxhPIUMmJYrll2lFU4fOX06)