Update version according to github

b0f4f53a · Rayyyyy · 392df446 · b0f4f53a · b0f4f53a · b0f4f53a
Commit b0f4f53a authored May 29, 2024 by Rayyyyy
20 changed files
--- a/docs/sentence_transformer/training/distributed.rst
+++ b/docs/sentence_transformer/training/distributed.rst
+
+Distributed Training
+====================
+
+Sentence Transformers implements two forms of distributed training: Data Parallel (DP) and Distributed Data Parallel (DDP). Read the `Data Parallelism documentation <https://huggingface.co/docs/transformers/en/perf_train_gpu_many#data-parallelism>`_ on Hugging Face for more details on these strategies. Some of the key differences include:
+
+1. DDP is generally faster than DP because it has to communicate less data.
+2. With DP, GPU 0 does the bulk of the work, while with DDP, the work is distributed more evenly across all GPUs.
+3. DDP allows for training across multiple machines, while DP is limited to a single machine.
+
+In short, **DDP is generally recommended**. You can use DDP by running your normal training scripts with ``torchrun`` or ``accelerate``. For example, if you have a script called ``train_script.py``, you can run it with DDP using the following command:
+
+.. |br| raw:: html
+
+   <div style="line-height: 0; padding: 0; margin: 0"></div>
+
+.. tab:: Via ``torchrun``
+
+   |br|
+
+   - `torchrun documentation <https://pytorch.org/docs/stable/elastic/run.html>`_
+
+   ::
+
+      torchrun --nproc_per_node=4 train_script.py
+   
+.. tab:: Via ``accelerate``
+
+   |br|
+
+   - `accelerate documentation <https://huggingface.co/docs/accelerate/en/index>`_
+
+   ::
+      
+      accelerate launch --num_processes 4 train_script.py
+
+.. note::
+  
+   When performing distributed training, you have to wrap your code in a ``main`` function and call it with ``if __name__ == "__main__":``. This is because each process will run the entire script, so you don't want to run the same code multiple times. Here is an example of how to do this::
+
+      from sentence_transformers import SentenceTransformer, SentenceTransformerTrainingArguments, SentenceTransformerTrainer
+      # Other imports here
+
+      def main():
+          # Your training code here
+
+      if __name__ == "__main__":
+          main()
+
+.. note::
+
+   When using an `Evaluator <../training_overview.html#evaluator>`_, the evaluator only runs on the first device unlike the training and evaluation datasets, which are shared across all devices. 
+
+Comparison
+----------
+
+The following table shows the speedup of DDP over DP and no parallelism given a certain hardware setup.
+
+- Hardware: a ``p3.8xlarge`` AWS instance, i.e. 4x V100 GPUs
+- Model being trained: `microsoft/mpnet-base <https://huggingface.co/microsoft/mpnet-base>`_ (133M parameters)
+- Maximum sequence length: 384 (following `all-mpnet-base-v2 <https://huggingface.co/sentence-transformers/all-mpnet-base-v2>`_)
+- Training datasets: MultiNLI, SNLI and STSB (note: these have short texts)
+- Losses: :class:`~sentence_transformers.losses.SoftmaxLoss` for MultiNLI and SNLI, :class:`~sentence_transformers.losses.CosineSimilarityLoss` for STSB
+- Batch size per device: 32
+
+.. list-table::
+   :header-rows: 1
+
+   * - Strategy
+     - Launcher
+     - Samples per Second
+   * - No Parallelism
+     - ``CUDA_VISIBLE_DEVICES=0 python train_script.py``
+     - 2724
+   * - Data Parallel (DP)
+     - ``python train_script.py`` (DP is used by default when launching a script with ``python``)
+     - 3675 (1.349x speedup)
+   * - **Distributed Data Parallel (DDP)**
+     - ``torchrun --nproc_per_node=4 train_script.py`` or ``accelerate launch --num_processes 4 train_script.py``
+     - **6980 (2.562x speedup)**
+
+FSDP
+----
+
+Fully Sharded Data Parallelism (FSDP) is another distributed training strategy that is not fully supported by Sentence Transformers. It is a more advanced version of DDP that is particularly useful for very large models. Note that in the previous comparison, FSDP reaches 5782 samples per second (2.122x speedup), i.e. **worse than DDP**. FSDP only makes sense with very large models. If you want to use FSDP with Sentence Transformers, you have to be aware of the following limitations:
+
+- You can't use the ``evaluator`` functionality with FSDP.
+- You have to save the trained model with ``trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")`` followed with ``trainer.save_model("output")``.
+- You have to use ``fsdp=["full_shard", "auto_wrap"]`` and ``fsdp_config={"transformer_layer_cls_to_wrap": "BertLayer"}`` in your ``SentenceTransformerTrainingArguments``, where ``BertLayer`` is the repeated layer in the encoder that houses the multi-head attention and feed-forward layers, so e.g. ``BertLayer`` or ``MPNetLayer``.
+
+Read the `FSDP documentation <https://huggingface.co/docs/accelerate/en/usage_guides/fsdp>`_ by Accelerate for more details.
\ No newline at end of file
--- a/docs/sentence_transformer/training/examples.rst
+++ b/docs/sentence_transformer/training/examples.rst
+
+Training Examples
+=================
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Supervised Learning
+
+   ../../../examples/training/sts/README
+   ../../../examples/training/nli/README
+   ../../../examples/training/paraphrases/README
+   ../../../examples/training/quora_duplicate_questions/README
+   ../../../examples/training/ms_marco/README
+   ../../../examples/training/matryoshka/README
+   ../../../examples/training/adaptive_layer/README
+   ../../../examples/training/multilingual/README
+   ../../../examples/training/distillation/README
+   ../../../examples/training/data_augmentation/README
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Unsupervised Learning
+
+   ../../../examples/unsupervised_learning/README
+   ../../../examples/domain_adaptation/README
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Advanced Usage
+
+   ../../../examples/training/hpo/README
+   distributed
\ No newline at end of file
--- a/docs/sentence_transformer/training_overview.md
+++ b/docs/sentence_transformer/training_overview.md
--- a/docs/sentence_transformer/usage/semantic_textual_similarity.rst
+++ b/docs/sentence_transformer/usage/semantic_textual_similarity.rst
+Semantic Textual Similarity
+===========================
+
+For Semantic Textual Similarity (STS), we want to produce embeddings for all texts involved and calculate the similarities between them. The text pairs with the highest similarity score are most semantically similar. See also the `Computing Embeddings <../../../examples/applications/computing-embeddings/README.html>`_ documentation for more advanced details on getting embedding scores.
+
+.. sidebar:: Documentation
+
+   1. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
+   2. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
+   3. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
+
+::
+
+    from sentence_transformers import SentenceTransformer
+
+    model = SentenceTransformer("all-MiniLM-L6-v2")
+
+    # Two lists of sentences
+    sentences1 = [
+        "The new movie is awesome",
+        "The cat sits outside",
+        "A man is playing guitar",
+    ]
+
+    sentences2 = [
+        "The dog plays in the garden",
+        "The new movie is so great",
+        "A woman watches TV",
+    ]
+
+    # Compute embeddings for both lists
+    embeddings1 = model.encode(sentences1)
+    embeddings2 = model.encode(sentences2)
+
+    # Compute cosine similarities
+    similarities = model.similarity(embeddings1, embeddings2)
+
+    # Output the pairs with their score
+    for idx_i, sentence1 in enumerate(sentences1):
+        print(sentence1)
+        for idx_j, sentence2 in enumerate(sentences2):
+            print(f" - {sentence2: <30}: {similarities[idx_i][idx_j]:.4f}")
+
+.. code-block:: txt
+    :emphasize-lines: 3
+
+    The new movie is awesome
+    - The dog plays in the garden   : 0.0543
+    - The new movie is so great     : 0.8939
+    - A woman watches TV            : -0.0502
+    The cat sits outside
+    - The dog plays in the garden   : 0.2838
+    - The new movie is so great     : -0.0029
+    - A woman watches TV            : 0.1310
+    A man is playing guitar
+    - The dog plays in the garden   : 0.2277
+    - The new movie is so great     : -0.0136
+    - A woman watches TV            : -0.0327
+
+In this example, the :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>` method returns a 3x3 matrix with the respective cosine similarity scores for all possible pairs between *embeddings1* and *embeddings2*.
+
+Similarity Calculation
+----------------------
+
+The similarity metric that is used is stored in the SentenceTransformer instance under :attr:`SentenceTransformer.similarity_fn_name <sentence_transformers.SentenceTransformer.similarity_fn_name>`. Valid options are:
+
+- ``SimilarityFunction.COSINE`` (a.k.a `"cosine"`): Cosine Similarity (**default**)
+- ``SimilarityFunction.DOT_PRODUCT`` (a.k.a `"dot"`): Dot Product
+- ``SimilarityFunction.EUCLIDEAN`` (a.k.a `"euclidean"`): Negative Euclidean Distance
+- ``SimilarityFunction.MANHATTAN`` (a.k.a. `"manhattan"`): Negative Manhattan Distance
+
+This value can be changed in a handful of ways:
+
+1. By initializing the SentenceTransformer instance with the desired similarity function::
+
+    from sentence_transformers import SentenceTransformer, SimilarityFunction
+
+    model = SentenceTransformer("all-MiniLM-L6-v2", similarity_fn_name=SimilarityFunction.DOT_PRODUCT)
+
+2. By setting the value directly on the SentenceTransformer instance::
+
+    from sentence_transformers import SentenceTransformer, SimilarityFunction
+
+    model = SentenceTransformer("all-MiniLM-L6-v2")
+    model.similarity_fn_name = SimilarityFunction.DOT_PRODUCT
+
+3. By setting the value under the ``"similarity_fn_name"`` key in the ``config_sentence_transformers.json`` file of a saved model. When you save a Sentence Transformer model, this value will be automatically saved as well.
+
+Sentence Transformers implements two methods to calculate the similarity between embeddings:
+
+- :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`: Calculates the similarity between all pairs of embeddings.
+- :meth:`SentenceTransformer.pairwise_similarity <sentence_transformers.SentenceTransformer.pairwise_similarity>`: Calculates the similarity between embeddings in a pairwise fashion.
+
+::
+
+    from sentence_transformers import SentenceTransformer, SimilarityFunction
+
+    # Load a pretrained Sentence Transformer model
+    model = SentenceTransformer("all-MiniLM-L6-v2")
+
+    # Embed some sentences
+    sentences = [
+        "The weather is lovely today.",
+        "It's so sunny outside!",
+        "He drove to the stadium.",
+    ]
+    embeddings = model.encode(sentences)
+
+    similarities = model.similarity(embeddings, embeddings)
+    print(similarities)
+    # tensor([[1.0000, 0.6660, 0.1046],
+    #         [0.6660, 1.0000, 0.1411],
+    #         [0.1046, 0.1411, 1.0000]])
+
+    # Change the similarity function to Manhattan distance
+    model.similarity_fn_name = SimilarityFunction.MANHATTAN
+    print(model.similarity_fn_name)
+    # => "manhattan"
+
+    similarities = model.similarity(embeddings, embeddings)
+    print(similarities)
+    # tensor([[ -0.0000, -12.6269, -20.2167],
+    #         [-12.6269,  -0.0000, -20.1288],
+    #         [-20.2167, -20.1288,  -0.0000]])
+
+.. note::
+
+   If a Sentence Transformer instance ends with a :class:`~sentence_transformers.models.Normalize` module, then it is sensible to choose the "dot" metric instead of "cosine".
+
+   Dot product on normalized embeddings is equivalent to cosine similarity, but "cosine" will re-normalize the embeddings again. As a result, the "dot" metric will be faster than "cosine".
+
+If you want find the highest scoring pairs in a long list of sentences, have a look at `Paraphrase Mining <../../examples/applications/paraphrase-mining/README.md>`_.
--- a/docs/sentence_transformer/usage/usage.rst
+++ b/docs/sentence_transformer/usage/usage.rst
+
+Usage
+=====
+
+Characteristics of Sentence Transformer (a.k.a bi-encoder) models:
+
+1. Calculates a **fixed-size vector representation (embedding)** given **texts or images**.
+2. Embedding calculation is often **efficient**, embedding similarity calculation is **very fast**.
+3. Applicable for a **wide range of tasks**, such as semantic textual similarity, semantic search, clustering, classification, paraphrase mining, and more.
+4. Often used as a **first step in a two-step retrieval process**, where a Cross-Encoder (a.k.a. reranker) model is used to re-rank the top-k results from the bi-encoder.
+
+Once you have `installed <installation.md>`_ Sentence Transformers, you can easily use Sentence Transformer models:
+
+.. sidebar:: Documentation
+
+   1. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
+   2. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
+   3. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
+
+::
+
+   from sentence_transformers import SentenceTransformer
+
+   # 1. Load a pretrained Sentence Transformer model
+   model = SentenceTransformer("all-MiniLM-L6-v2")
+
+   # The sentences to encode
+   sentences = [
+       "The weather is lovely today.",
+       "It's so sunny outside!",
+       "He drove to the stadium.",
+   ]
+
+   # 2. Calculate embeddings by calling model.encode()
+   embeddings = model.encode(sentences)
+   print(embeddings.shape)
+   # [3, 384]
+
+   # 3. Calculate the embedding similarities
+   similarities = model.similarity(embeddings, embeddings)
+   print(similarities)
+   # tensor([[1.0000, 0.6660, 0.1046],
+   #         [0.6660, 1.0000, 0.1411],
+   #         [0.1046, 0.1411, 1.0000]])
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Tasks and Advanced Usage
+
+   ../../../examples/applications/computing-embeddings/README
+   semantic_textual_similarity
+   ../../../examples/applications/semantic-search/README
+   ../../../examples/applications/retrieve_rerank/README
+   ../../../examples/applications/clustering/README
+   ../../../examples/applications/paraphrase-mining/README
+   ../../../examples/applications/parallel-sentence-mining/README
+   ../../../examples/applications/image-search/README
+   ../../../examples/applications/embedding-quantization/README
+
--- a/examples/README.md
+++ b/examples/README.md
@@ -9,7 +9,7 @@ The [applications](applications/) folder contains examples how to use SentenceTr
 The [evaluation](evaluation/) folder contains some examples how to evaluate SentenceTransformer models for common tasks.

 ## Training 
-The [training](training/) folder contains examples how to fine-tune transformer models like BERT, RoBERTa, or XLM-RoBERTa for generating sentence embedding. For the documentation how to train your own models, see [Training Overview](http://www.sbert.net/docs/training/overview.html).
+The [training](training/) folder contains examples how to fine-tune transformer models like BERT, RoBERTa, or XLM-RoBERTa for generating sentence embedding. For the documentation how to train your own models, see [Training Overview](http://www.sbert.net/docs/sentence_transformer/training_overview.html).


 ## Unsupervised Learning

--- a/examples/applications/clustering/README.md
+++ b/examples/applications/clustering/README.md
@@ -15,7 +15,7 @@ In [fast_clustering.py](fast_clustering.py) we present a clustering algorithm th

 You can configure the threshold of cosine-similarity for which we consider two sentences as similar. Also, you can specify the minimal size for a local community. This allows you to get either large coarse-grained clusters or small fine-grained clusters. 

-We apply it on the [Quora Duplicate Questions](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs) dataset and the output looks something like this:
+We apply it on the [Quora Duplicate Questions](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) dataset and the output looks something like this:

 ```
 Cluster 1, #83 Elements
@@ -51,7 +51,6 @@ For each topic, you want to extract the words that describe this topic:

 ![20news](https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/20news_top2vec.png) 

-Sentence-Transformers can be used to identify these topics in a collection of sentences, paragraphs or short documents. For an excellent tutorial, see [Topic Modeling with BERT](https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6) as well as the repositories [Top2Vec](https://github.com/ddangelov/Top2Vec) and [BERTopic](https://github.com/MaartenGr/BERTopic).
+Sentence-Transformers can be used to identify these topics in a collection of sentences, paragraphs or short documents. For an excellent tutorial, see [Topic Modeling with BERT](https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6) as well as the [BERTopic](https://github.com/MaartenGr/BERTopic) and [Top2Vec](https://github.com/ddangelov/Top2Vec) repositories.
 
- 
- Image source: [Top2Vec: Distributed Representations of Topics](https://arxiv.org/abs/2008.09470)
+Image source: [Top2Vec: Distributed Representations of Topics](https://arxiv.org/abs/2008.09470)
--- a/examples/applications/clustering/agglomerative.py
+++ b/examples/applications/clustering/agglomerative.py
@@ -4,9 +4,10 @@ This is a simple application for sentence embeddings: clustering
 Sentences are mapped to sentence embeddings and then agglomerative clustering with a threshold is applied.
 """

-from sentence_transformers import SentenceTransformer
 from sklearn.cluster import AgglomerativeClustering

+from sentence_transformers import SentenceTransformer
+
 embedder = SentenceTransformer("all-MiniLM-L6-v2")

 # Corpus with example sentences
@@ -28,7 +29,7 @@ corpus_embeddings = embedder.encode(corpus)
 # Some models don't automatically normalize the embeddings, in which case you should normalize the embeddings:
 # corpus_embeddings = corpus_embeddings / np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)

-# Perform kmean clustering
+# Perform agglomerative clustering
 clustering_model = AgglomerativeClustering(
    n_clusters=None, distance_threshold=1.5
 )  # , affinity='cosine', linkage='average', distance_threshold=0.4)

--- a/examples/applications/clustering/fast_clustering.py
+++ b/examples/applications/clustering/fast_clustering.py
@@ -12,11 +12,11 @@ The method for finding the communities is extremely fast, for clustering 50k sen
 In this example, we download a large set of questions from Quora and then find similar questions in this set.
 """

-from sentence_transformers import SentenceTransformer, util
-import os
 import csv
+import os
 import time

+from sentence_transformers import SentenceTransformer, util

 # Model for computing sentence embeddings. We use one trained for similar questions detection
 model = SentenceTransformer("all-MiniLM-L6-v2")

--- a/examples/applications/clustering/kmeans.py
+++ b/examples/applications/clustering/kmeans.py
@@ -4,9 +4,10 @@ This is a simple application for sentence embeddings: clustering
 Sentences are mapped to sentence embeddings and then k-mean clustering is applied.
 """

-from sentence_transformers import SentenceTransformer
 from sklearn.cluster import KMeans

+from sentence_transformers import SentenceTransformer
+
 embedder = SentenceTransformer("all-MiniLM-L6-v2")

 # Corpus with example sentences

--- a/examples/applications/computing-embeddings/README.rst
+++ b/examples/applications/computing-embeddings/README.rst
+Computing Embeddings
+====================
+
+Once you have `installed <installation.md>`_ Sentence Transformers, you can easily use Sentence Transformer models:
+
+.. sidebar:: Documentation
+
+   1. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
+   2. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
+   3. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
+
+::
+
+   from sentence_transformers import SentenceTransformer
+
+   # 1. Load a pretrained Sentence Transformer model
+   model = SentenceTransformer("all-MiniLM-L6-v2")
+
+   # The sentences to encode
+   sentences = [
+       "The weather is lovely today.",
+       "It's so sunny outside!",
+       "He drove to the stadium.",
+   ]
+
+   # 2. Calculate embeddings by calling model.encode()
+   embeddings = model.encode(sentences)
+   print(embeddings.shape)
+   # [3, 384]
+
+   # 3. Calculate the embedding similarities
+   similarities = model.similarity(embeddings, embeddings)
+   print(similarities)
+   # tensor([[1.0000, 0.6660, 0.1046],
+   #         [0.6660, 1.0000, 0.1411],
+   #         [0.1046, 0.1411, 1.0000]])
+
+.. note::
+   Even though we talk about sentence embeddings, you can use Sentence Transformers for shorter phrases as well as for longer texts with multiple sentences. See `Input Sequence Length <#input-sequence-length>`_ for notes on embeddings for longer texts.
+
+
+Initializing a Sentence Transformer Model
+-----------------------------------------
+
+The first step is to load a pretrained Sentence Transformer model. You can use any of the models from the `Pretrained Models <../docs/sentence_transformer/pretrained_models.html>`_ or a local model. See also :class:`~sentence_transformers.SentenceTransformer` for information on parameters.
+
+::
+
+   from sentence_transformers import SentenceTransformer
+
+   model = SentenceTransformer("all-mpnet-base-v2")
+   # Alternatively, you can pass a path to a local model directory:
+   model = SentenceTransformer("output/models/mpnet-base-finetuned-all-nli")
+
+The model will automatically be placed on the most performant available device, e.g. ``cuda`` or ``mps`` if available. You can also specify the device explicitly:
+
+::
+
+   model = SentenceTransformer("all-mpnet-base-v2", device="cuda")
+
+Calculating Embeddings
+----------------------
+
+The method to calculate embeddings is :meth:`SentenceTransformer.encode< sentence_transformers.SentenceTransformer.encode>`.
+
+
+Prompt Templates
+----------------
+
+Some models require using specific text *prompts* to achieve optimal performance. For example, with `intfloat/multilingual-e5-large <https://huggingface.co/intfloat/multilingual-e5-large>`_ you should prefix all queries with ``"query: "`` and all passages with ``"passage: "``. Another example is `BAAI/bge-large-en-v1.5 <https://huggingface.co/BAAI/bge-large-en-v1.5>`_, which performs best for retrieval when the input texts are prefixed with ``"Represent this sentence for searching relevant passages: "``. 
+
+Sentence Transformer models can be initialized with ``prompts`` and ``default_prompt_name`` parameters:
+
+- ``prompts`` is an optional argument that accepts a dictionary of prompts with prompt names to prompt texts. The prompt will be prepended to the input text during inference. For example::
+
+    model = SentenceTransformer(
+        "intfloat/multilingual-e5-large",
+        prompts={
+            "classification": "Classify the following text: ",
+            "retrieval": "Retrieve semantically similar text: ",
+            "clustering": "Identify the topic or theme based on the text: ",
+        },
+    )
+    # or
+    model.prompts = {
+        "classification": "Classify the following text: ",
+        "retrieval": "Retrieve semantically similar text: ",
+        "clustering": "Identify the topic or theme based on the text: ",
+    }
+
+- ``default_prompt_name`` is an optional argument that determines the default prompt to be used. It has to correspond with a prompt name from ``prompts``. If ``None``, then no prompt is used by default. For example::
+
+    model = SentenceTransformer(
+        "intfloat/multilingual-e5-large",
+        prompts={
+            "classification": "Classify the following text: ",
+            "retrieval": "Retrieve semantically similar text: ",
+            "clustering": "Identify the topic or theme based on the text: ",
+        },
+        default_prompt_name="retrieval",
+    )
+    # or
+    model.default_prompt_name="retrieval"
+
+Both of these parameters can also be specified in the ``config_sentence_transformers.json`` file of a saved model. That way, you won't have to specify these options manually when loading. When you save a Sentence Transformer model, these options will be automatically saved as well.
+
+During inference, prompts can be applied in a few different ways. All of these scenarios result in identical texts being embedded:
+
+1. Explicitly using the ``prompt`` option in ``SentenceTransformer.encode``::
+
+    embeddings = model.encode("How to bake a strawberry cake", prompt="Retrieve semantically similar text: ")
+
+2. Explicitly using the ``prompt_name`` option in ``SentenceTransformer.encode`` by relying on the prompts loaded from a) initialization or b) the model config::
+
+    embeddings = model.encode("How to bake a strawberry cake", prompt_name="retrieval")
+
+3. If ``prompt`` nor ``prompt_name`` are specified in ``SentenceTransformer.encode``, then the prompt specified by ``default_prompt_name`` will be applied. If it is ``None``, then no prompt will be applied::
+
+    embeddings = model.encode("How to bake a strawberry cake")
+
+Input Sequence Length
+---------------------
+
+For transformer models like BERT, RoBERTa, DistilBERT etc., the runtime and memory requirement grows quadratic with the input length. This limits transformers to inputs of certain lengths. A common value for BERT-based models are 512 tokens, which corresponds to about 300-400 words (for English).
+
+Each model has a maximum sequence length under ``model.max_seq_length``, which is the maximal number of tokens that can be processed. Longer texts will be truncated to the first ``model.max_seq_length`` tokens::
+
+    from sentence_transformers import SentenceTransformer
+
+    model = SentenceTransformer("all-MiniLM-L6-v2")
+    print("Max Sequence Length:", model.max_seq_length)
+    # => Max Sequence Length: 256
+
+    # Change the length to 200
+    model.max_seq_length = 200
+
+    print("Max Sequence Length:", model.max_seq_length)
+    # => Max Sequence Length: 200
+
+.. note::
+
+   You cannot increase the length higher than what is maximally supported by the respective transformer model. Also note that if a model was trained on short texts, the representations for long texts might not be that good.
+
+Multi-Process / Multi-GPU Encoding
+----------------------------------
+
+You can encode input texts with more than one GPU (or with multiple processes on a CPU machine). For an example, see: `computing_embeddings_multi_gpu.py <https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/computing-embeddings/computing_embeddings_multi_gpu.py>`_.
+
+ 
+The relevant method is :meth:`~sentence_transformers.SentenceTransformer.start_multi_process_pool`, which starts multiple processes that are used for encoding.
\ No newline at end of file
--- a/examples/applications/computing-embeddings/computing_embeddings.py
+++ b/examples/applications/computing-embeddings/computing_embeddings.py
@@ -3,10 +3,12 @@ This basic example loads a pre-trained model from the web and uses it to
 generate sentence embeddings for a given list of sentences.
 """

-from sentence_transformers import SentenceTransformer, LoggingHandler
-import numpy as np
 import logging

+import numpy as np
+
+from sentence_transformers import LoggingHandler, SentenceTransformer
+
 #### Just some code to print debug information to stdout
 np.set_printoptions(threshold=100)


--- a/examples/applications/computing-embeddings/computing_embeddings_multi_gpu.py
+++ b/examples/applications/computing-embeddings/computing_embeddings_multi_gpu.py
@@ -4,9 +4,10 @@ sentences in parallel. This gives a near linear speed-up
 when encoding large text collections.
 """

-from sentence_transformers import SentenceTransformer, LoggingHandler
 import logging

+from sentence_transformers import LoggingHandler, SentenceTransformer
+
 logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO, handlers=[LoggingHandler()]
 )

--- a/examples/applications/computing-embeddings/computing_embeddings_streaming.py
+++ b/examples/applications/computing-embeddings/computing_embeddings_streaming.py
@@ -8,12 +8,14 @@ limit the amount of memory used. More info about dataset streaming:
 https://huggingface.co/docs/datasets/stream
 """

-from sentence_transformers import SentenceTransformer, LoggingHandler
 import logging
-from datasets import load_dataset
+
 from torch.utils.data import DataLoader
 from tqdm import tqdm

+from datasets import load_dataset
+from sentence_transformers import LoggingHandler, SentenceTransformer
+
 logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO, handlers=[LoggingHandler()]
 )

--- a/examples/applications/cross-encoder/cross-encoder_reranking.py
+++ b/examples/applications/cross-encoder/cross-encoder_reranking.py
@@ -7,13 +7,13 @@ https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs
 Then, we re-rank the hits from the Bi-Encoder using a Cross-Encoder.
 """

-from sentence_transformers import SentenceTransformer, util
-from sentence_transformers import CrossEncoder
-import os
 import csv
+import os
 import pickle
 import time

+from sentence_transformers import CrossEncoder, SentenceTransformer, util
+
 # We use a BiEncoder (SentenceTransformer) that produces embeddings for questions.
 # We then search for similar questions using cosine similarity and identify the top 100 most similar questions
 model_name = "all-MiniLM-L6-v2"

--- a/examples/applications/cross-encoder/cross-encoder_usage.py
+++ b/examples/applications/cross-encoder/cross-encoder_usage.py
@@ -4,9 +4,10 @@ sentences in a corpus using a Cross-Encoder for semantic textual similarity (STS
 It output then the most similar sentences for the given query.
 """

-from sentence_transformers.cross_encoder import CrossEncoder
 import numpy as np

+from sentence_transformers.cross_encoder import CrossEncoder
+
 # Pre-trained cross encoder
 model = CrossEncoder("cross-encoder/stsb-distilroberta-base")


--- a/examples/applications/embedding-quantization/README.md
+++ b/examples/applications/embedding-quantization/README.md
@@ -20,6 +20,16 @@ Quantizing an embedding with a dimensionality of 1024 to binary would result in

 As a result, in practice quantizing a `float32` embedding with a dimensionality of 1024 yields an `int8` or `uint8` embedding with a dimensionality of 128. See two approaches of how you can produce quantized embeddings using Sentence Transformers below:

+```eval_rst
+.. sidebar:: References
+
+   #. `mixedbread-ai/mxbai-embed-large-v1 <https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1>`_
+   #. :class:`~sentence_transformers.SentenceTransformer`
+   #. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
+   #. :func:`~sentence_transformers.quantization.quantize_embeddings`
+
+```
+
 ```python
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.quantization import quantize_embeddings
@@ -38,11 +48,6 @@ embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."
 binary_embeddings = quantize_embeddings(embeddings, precision="binary")
 ```

-**References:**
-* <a href="https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1"><code>mixedbread-ai/mxbai-embed-large-v1</code></a>
-* <a href="../../../docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode"><code>SentenceTransformer.encode</code></a>
-* <a href="../../../docs/package_reference/quantization.html#sentence_transformers.quantization.quantize_embeddings"><code>quantize_embeddings</code></a>
-
 Here you can see the differences between default `float32` embeddings and binary embeddings in terms of shape, size, and `numpy` dtype:

 ```python
@@ -84,6 +89,16 @@ Computing int8 quantization buckets based on 2 embeddings. int8 quantization is

 See how you can produce scalar quantized embeddings using Sentence Transformers below:

+```eval_rst
+.. sidebar:: References
+
+   #. `mixedbread-ai/mxbai-embed-large-v1 <https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1>`_
+   #. :class:`~sentence_transformers.SentenceTransformer`
+   #. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
+   #. :func:`~sentence_transformers.quantization.quantize_embeddings`
+
+```
+
 ```python
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.quantization import quantize_embeddings
@@ -105,11 +120,6 @@ int8_embeddings = quantize_embeddings(
 )
 ```

-**References:**
-* <a href="https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1"><code>mixedbread-ai/mxbai-embed-large-v1</code></a>
-* <a href="../../../docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode"><code>SentenceTransformer.encode</code></a>
-* <a href="../../../docs/package_reference/quantization.html#sentence_transformers.quantization.quantize_embeddings"><code>quantize_embeddings</code></a>
-
 Here you can see the differences between default `float32` embeddings and `int8` scalar embeddings in terms of shape, size, and `numpy` dtype:

 ```python
@@ -154,6 +164,7 @@ The following demo showcases the retrieval efficiency using `exact` search throu
 	width="100%"
 	height="1000"
 ></iframe>
+<p></p> <!-- Force a newline -->

 ## Try it yourself


--- a/examples/applications/embedding-quantization/semantic_search_faiss.py
+++ b/examples/applications/embedding-quantization/semantic_search_faiss.py
 import time
+
+from datasets import load_dataset
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.quantization import quantize_embeddings, semantic_search_faiss
-from datasets import load_dataset

 # 1. Load the quora corpus with questions
 dataset = load_dataset("quora", split="train").map(

--- a/examples/applications/embedding-quantization/semantic_search_faiss_benchmark.py
+++ b/examples/applications/embedding-quantization/semantic_search_faiss_benchmark.py
+from datasets import load_dataset
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.quantization import quantize_embeddings, semantic_search_faiss
-from datasets import load_dataset

 # 1. Load the quora corpus with questions
 dataset = load_dataset("quora", split="train").map(

--- a/examples/applications/embedding-quantization/semantic_search_recommended.py
+++ b/examples/applications/embedding-quantization/semantic_search_recommended.py
@@ -8,13 +8,14 @@ import json
 import os
 import time

+import faiss
 import numpy as np
+from usearch.index import Index
+
+from datasets import load_dataset
 from sentence_transformers import SentenceTransformer
 from sentence_transformers.quantization import quantize_embeddings
-from datasets import load_dataset

-import faiss
-from usearch.index import Index
 # We use usearch as it can efficiently load int8 vectors from disk.

 # Load the model