Commit b0f4f53a authored by Rayyyyy's avatar Rayyyyy
Browse files

Update version according to github

parent 392df446
Distributed Training
====================
Sentence Transformers implements two forms of distributed training: Data Parallel (DP) and Distributed Data Parallel (DDP). Read the `Data Parallelism documentation <https://huggingface.co/docs/transformers/en/perf_train_gpu_many#data-parallelism>`_ on Hugging Face for more details on these strategies. Some of the key differences include:
1. DDP is generally faster than DP because it has to communicate less data.
2. With DP, GPU 0 does the bulk of the work, while with DDP, the work is distributed more evenly across all GPUs.
3. DDP allows for training across multiple machines, while DP is limited to a single machine.
In short, **DDP is generally recommended**. You can use DDP by running your normal training scripts with ``torchrun`` or ``accelerate``. For example, if you have a script called ``train_script.py``, you can run it with DDP using the following command:
.. |br| raw:: html
<div style="line-height: 0; padding: 0; margin: 0"></div>
.. tab:: Via ``torchrun``
|br|
- `torchrun documentation <https://pytorch.org/docs/stable/elastic/run.html>`_
::
torchrun --nproc_per_node=4 train_script.py
.. tab:: Via ``accelerate``
|br|
- `accelerate documentation <https://huggingface.co/docs/accelerate/en/index>`_
::
accelerate launch --num_processes 4 train_script.py
.. note::
When performing distributed training, you have to wrap your code in a ``main`` function and call it with ``if __name__ == "__main__":``. This is because each process will run the entire script, so you don't want to run the same code multiple times. Here is an example of how to do this::
from sentence_transformers import SentenceTransformer, SentenceTransformerTrainingArguments, SentenceTransformerTrainer
# Other imports here
def main():
# Your training code here
if __name__ == "__main__":
main()
.. note::
When using an `Evaluator <../training_overview.html#evaluator>`_, the evaluator only runs on the first device unlike the training and evaluation datasets, which are shared across all devices.
Comparison
----------
The following table shows the speedup of DDP over DP and no parallelism given a certain hardware setup.
- Hardware: a ``p3.8xlarge`` AWS instance, i.e. 4x V100 GPUs
- Model being trained: `microsoft/mpnet-base <https://huggingface.co/microsoft/mpnet-base>`_ (133M parameters)
- Maximum sequence length: 384 (following `all-mpnet-base-v2 <https://huggingface.co/sentence-transformers/all-mpnet-base-v2>`_)
- Training datasets: MultiNLI, SNLI and STSB (note: these have short texts)
- Losses: :class:`~sentence_transformers.losses.SoftmaxLoss` for MultiNLI and SNLI, :class:`~sentence_transformers.losses.CosineSimilarityLoss` for STSB
- Batch size per device: 32
.. list-table::
:header-rows: 1
* - Strategy
- Launcher
- Samples per Second
* - No Parallelism
- ``CUDA_VISIBLE_DEVICES=0 python train_script.py``
- 2724
* - Data Parallel (DP)
- ``python train_script.py`` (DP is used by default when launching a script with ``python``)
- 3675 (1.349x speedup)
* - **Distributed Data Parallel (DDP)**
- ``torchrun --nproc_per_node=4 train_script.py`` or ``accelerate launch --num_processes 4 train_script.py``
- **6980 (2.562x speedup)**
FSDP
----
Fully Sharded Data Parallelism (FSDP) is another distributed training strategy that is not fully supported by Sentence Transformers. It is a more advanced version of DDP that is particularly useful for very large models. Note that in the previous comparison, FSDP reaches 5782 samples per second (2.122x speedup), i.e. **worse than DDP**. FSDP only makes sense with very large models. If you want to use FSDP with Sentence Transformers, you have to be aware of the following limitations:
- You can't use the ``evaluator`` functionality with FSDP.
- You have to save the trained model with ``trainer.accelerator.state.fsdp_plugin.set_state_dict_type("FULL_STATE_DICT")`` followed with ``trainer.save_model("output")``.
- You have to use ``fsdp=["full_shard", "auto_wrap"]`` and ``fsdp_config={"transformer_layer_cls_to_wrap": "BertLayer"}`` in your ``SentenceTransformerTrainingArguments``, where ``BertLayer`` is the repeated layer in the encoder that houses the multi-head attention and feed-forward layers, so e.g. ``BertLayer`` or ``MPNetLayer``.
Read the `FSDP documentation <https://huggingface.co/docs/accelerate/en/usage_guides/fsdp>`_ by Accelerate for more details.
\ No newline at end of file
Training Examples
=================
.. toctree::
:maxdepth: 1
:caption: Supervised Learning
../../../examples/training/sts/README
../../../examples/training/nli/README
../../../examples/training/paraphrases/README
../../../examples/training/quora_duplicate_questions/README
../../../examples/training/ms_marco/README
../../../examples/training/matryoshka/README
../../../examples/training/adaptive_layer/README
../../../examples/training/multilingual/README
../../../examples/training/distillation/README
../../../examples/training/data_augmentation/README
.. toctree::
:maxdepth: 1
:caption: Unsupervised Learning
../../../examples/unsupervised_learning/README
../../../examples/domain_adaptation/README
.. toctree::
:maxdepth: 1
:caption: Advanced Usage
../../../examples/training/hpo/README
distributed
\ No newline at end of file
This diff is collapsed.
Semantic Textual Similarity
===========================
For Semantic Textual Similarity (STS), we want to produce embeddings for all texts involved and calculate the similarities between them. The text pairs with the highest similarity score are most semantically similar. See also the `Computing Embeddings <../../../examples/applications/computing-embeddings/README.html>`_ documentation for more advanced details on getting embedding scores.
.. sidebar:: Documentation
1. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
2. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
3. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
::
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
# Two lists of sentences
sentences1 = [
"The new movie is awesome",
"The cat sits outside",
"A man is playing guitar",
]
sentences2 = [
"The dog plays in the garden",
"The new movie is so great",
"A woman watches TV",
]
# Compute embeddings for both lists
embeddings1 = model.encode(sentences1)
embeddings2 = model.encode(sentences2)
# Compute cosine similarities
similarities = model.similarity(embeddings1, embeddings2)
# Output the pairs with their score
for idx_i, sentence1 in enumerate(sentences1):
print(sentence1)
for idx_j, sentence2 in enumerate(sentences2):
print(f" - {sentence2: <30}: {similarities[idx_i][idx_j]:.4f}")
.. code-block:: txt
:emphasize-lines: 3
The new movie is awesome
- The dog plays in the garden : 0.0543
- The new movie is so great : 0.8939
- A woman watches TV : -0.0502
The cat sits outside
- The dog plays in the garden : 0.2838
- The new movie is so great : -0.0029
- A woman watches TV : 0.1310
A man is playing guitar
- The dog plays in the garden : 0.2277
- The new movie is so great : -0.0136
- A woman watches TV : -0.0327
In this example, the :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>` method returns a 3x3 matrix with the respective cosine similarity scores for all possible pairs between *embeddings1* and *embeddings2*.
Similarity Calculation
----------------------
The similarity metric that is used is stored in the SentenceTransformer instance under :attr:`SentenceTransformer.similarity_fn_name <sentence_transformers.SentenceTransformer.similarity_fn_name>`. Valid options are:
- ``SimilarityFunction.COSINE`` (a.k.a `"cosine"`): Cosine Similarity (**default**)
- ``SimilarityFunction.DOT_PRODUCT`` (a.k.a `"dot"`): Dot Product
- ``SimilarityFunction.EUCLIDEAN`` (a.k.a `"euclidean"`): Negative Euclidean Distance
- ``SimilarityFunction.MANHATTAN`` (a.k.a. `"manhattan"`): Negative Manhattan Distance
This value can be changed in a handful of ways:
1. By initializing the SentenceTransformer instance with the desired similarity function::
from sentence_transformers import SentenceTransformer, SimilarityFunction
model = SentenceTransformer("all-MiniLM-L6-v2", similarity_fn_name=SimilarityFunction.DOT_PRODUCT)
2. By setting the value directly on the SentenceTransformer instance::
from sentence_transformers import SentenceTransformer, SimilarityFunction
model = SentenceTransformer("all-MiniLM-L6-v2")
model.similarity_fn_name = SimilarityFunction.DOT_PRODUCT
3. By setting the value under the ``"similarity_fn_name"`` key in the ``config_sentence_transformers.json`` file of a saved model. When you save a Sentence Transformer model, this value will be automatically saved as well.
Sentence Transformers implements two methods to calculate the similarity between embeddings:
- :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`: Calculates the similarity between all pairs of embeddings.
- :meth:`SentenceTransformer.pairwise_similarity <sentence_transformers.SentenceTransformer.pairwise_similarity>`: Calculates the similarity between embeddings in a pairwise fashion.
::
from sentence_transformers import SentenceTransformer, SimilarityFunction
# Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")
# Embed some sentences
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])
# Change the similarity function to Manhattan distance
model.similarity_fn_name = SimilarityFunction.MANHATTAN
print(model.similarity_fn_name)
# => "manhattan"
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ -0.0000, -12.6269, -20.2167],
# [-12.6269, -0.0000, -20.1288],
# [-20.2167, -20.1288, -0.0000]])
.. note::
If a Sentence Transformer instance ends with a :class:`~sentence_transformers.models.Normalize` module, then it is sensible to choose the "dot" metric instead of "cosine".
Dot product on normalized embeddings is equivalent to cosine similarity, but "cosine" will re-normalize the embeddings again. As a result, the "dot" metric will be faster than "cosine".
If you want find the highest scoring pairs in a long list of sentences, have a look at `Paraphrase Mining <../../examples/applications/paraphrase-mining/README.md>`_.
Usage
=====
Characteristics of Sentence Transformer (a.k.a bi-encoder) models:
1. Calculates a **fixed-size vector representation (embedding)** given **texts or images**.
2. Embedding calculation is often **efficient**, embedding similarity calculation is **very fast**.
3. Applicable for a **wide range of tasks**, such as semantic textual similarity, semantic search, clustering, classification, paraphrase mining, and more.
4. Often used as a **first step in a two-step retrieval process**, where a Cross-Encoder (a.k.a. reranker) model is used to re-rank the top-k results from the bi-encoder.
Once you have `installed <installation.md>`_ Sentence Transformers, you can easily use Sentence Transformer models:
.. sidebar:: Documentation
1. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
2. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
3. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
::
from sentence_transformers import SentenceTransformer
# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])
.. toctree::
:maxdepth: 1
:caption: Tasks and Advanced Usage
../../../examples/applications/computing-embeddings/README
semantic_textual_similarity
../../../examples/applications/semantic-search/README
../../../examples/applications/retrieve_rerank/README
../../../examples/applications/clustering/README
../../../examples/applications/paraphrase-mining/README
../../../examples/applications/parallel-sentence-mining/README
../../../examples/applications/image-search/README
../../../examples/applications/embedding-quantization/README
......@@ -9,7 +9,7 @@ The [applications](applications/) folder contains examples how to use SentenceTr
The [evaluation](evaluation/) folder contains some examples how to evaluate SentenceTransformer models for common tasks.
## Training
The [training](training/) folder contains examples how to fine-tune transformer models like BERT, RoBERTa, or XLM-RoBERTa for generating sentence embedding. For the documentation how to train your own models, see [Training Overview](http://www.sbert.net/docs/training/overview.html).
The [training](training/) folder contains examples how to fine-tune transformer models like BERT, RoBERTa, or XLM-RoBERTa for generating sentence embedding. For the documentation how to train your own models, see [Training Overview](http://www.sbert.net/docs/sentence_transformer/training_overview.html).
## Unsupervised Learning
......
......@@ -15,7 +15,7 @@ In [fast_clustering.py](fast_clustering.py) we present a clustering algorithm th
You can configure the threshold of cosine-similarity for which we consider two sentences as similar. Also, you can specify the minimal size for a local community. This allows you to get either large coarse-grained clusters or small fine-grained clusters.
We apply it on the [Quora Duplicate Questions](https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs) dataset and the output looks something like this:
We apply it on the [Quora Duplicate Questions](https://huggingface.co/datasets/sentence-transformers/quora-duplicates) dataset and the output looks something like this:
```
Cluster 1, #83 Elements
......@@ -51,7 +51,6 @@ For each topic, you want to extract the words that describe this topic:
![20news](https://raw.githubusercontent.com/UKPLab/sentence-transformers/master/docs/img/20news_top2vec.png)
Sentence-Transformers can be used to identify these topics in a collection of sentences, paragraphs or short documents. For an excellent tutorial, see [Topic Modeling with BERT](https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6) as well as the repositories [Top2Vec](https://github.com/ddangelov/Top2Vec) and [BERTopic](https://github.com/MaartenGr/BERTopic).
Sentence-Transformers can be used to identify these topics in a collection of sentences, paragraphs or short documents. For an excellent tutorial, see [Topic Modeling with BERT](https://towardsdatascience.com/topic-modeling-with-bert-779f7db187e6) as well as the [BERTopic](https://github.com/MaartenGr/BERTopic) and [Top2Vec](https://github.com/ddangelov/Top2Vec) repositories.
Image source: [Top2Vec: Distributed Representations of Topics](https://arxiv.org/abs/2008.09470)
Image source: [Top2Vec: Distributed Representations of Topics](https://arxiv.org/abs/2008.09470)
......@@ -4,9 +4,10 @@ This is a simple application for sentence embeddings: clustering
Sentences are mapped to sentence embeddings and then agglomerative clustering with a threshold is applied.
"""
from sentence_transformers import SentenceTransformer
from sklearn.cluster import AgglomerativeClustering
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer("all-MiniLM-L6-v2")
# Corpus with example sentences
......@@ -28,7 +29,7 @@ corpus_embeddings = embedder.encode(corpus)
# Some models don't automatically normalize the embeddings, in which case you should normalize the embeddings:
# corpus_embeddings = corpus_embeddings / np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)
# Perform kmean clustering
# Perform agglomerative clustering
clustering_model = AgglomerativeClustering(
n_clusters=None, distance_threshold=1.5
) # , affinity='cosine', linkage='average', distance_threshold=0.4)
......
......@@ -12,11 +12,11 @@ The method for finding the communities is extremely fast, for clustering 50k sen
In this example, we download a large set of questions from Quora and then find similar questions in this set.
"""
from sentence_transformers import SentenceTransformer, util
import os
import csv
import os
import time
from sentence_transformers import SentenceTransformer, util
# Model for computing sentence embeddings. We use one trained for similar questions detection
model = SentenceTransformer("all-MiniLM-L6-v2")
......
......@@ -4,9 +4,10 @@ This is a simple application for sentence embeddings: clustering
Sentences are mapped to sentence embeddings and then k-mean clustering is applied.
"""
from sentence_transformers import SentenceTransformer
from sklearn.cluster import KMeans
from sentence_transformers import SentenceTransformer
embedder = SentenceTransformer("all-MiniLM-L6-v2")
# Corpus with example sentences
......
Computing Embeddings
====================
Once you have `installed <installation.md>`_ Sentence Transformers, you can easily use Sentence Transformer models:
.. sidebar:: Documentation
1. :class:`SentenceTransformer <sentence_transformers.SentenceTransformer>`
2. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
3. :meth:`SentenceTransformer.similarity <sentence_transformers.SentenceTransformer.similarity>`
::
from sentence_transformers import SentenceTransformer
# 1. Load a pretrained Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")
# The sentences to encode
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium.",
]
# 2. Calculate embeddings by calling model.encode()
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# 3. Calculate the embedding similarities
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6660, 0.1046],
# [0.6660, 1.0000, 0.1411],
# [0.1046, 0.1411, 1.0000]])
.. note::
Even though we talk about sentence embeddings, you can use Sentence Transformers for shorter phrases as well as for longer texts with multiple sentences. See `Input Sequence Length <#input-sequence-length>`_ for notes on embeddings for longer texts.
Initializing a Sentence Transformer Model
-----------------------------------------
The first step is to load a pretrained Sentence Transformer model. You can use any of the models from the `Pretrained Models <../docs/sentence_transformer/pretrained_models.html>`_ or a local model. See also :class:`~sentence_transformers.SentenceTransformer` for information on parameters.
::
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-mpnet-base-v2")
# Alternatively, you can pass a path to a local model directory:
model = SentenceTransformer("output/models/mpnet-base-finetuned-all-nli")
The model will automatically be placed on the most performant available device, e.g. ``cuda`` or ``mps`` if available. You can also specify the device explicitly:
::
model = SentenceTransformer("all-mpnet-base-v2", device="cuda")
Calculating Embeddings
----------------------
The method to calculate embeddings is :meth:`SentenceTransformer.encode< sentence_transformers.SentenceTransformer.encode>`.
Prompt Templates
----------------
Some models require using specific text *prompts* to achieve optimal performance. For example, with `intfloat/multilingual-e5-large <https://huggingface.co/intfloat/multilingual-e5-large>`_ you should prefix all queries with ``"query: "`` and all passages with ``"passage: "``. Another example is `BAAI/bge-large-en-v1.5 <https://huggingface.co/BAAI/bge-large-en-v1.5>`_, which performs best for retrieval when the input texts are prefixed with ``"Represent this sentence for searching relevant passages: "``.
Sentence Transformer models can be initialized with ``prompts`` and ``default_prompt_name`` parameters:
- ``prompts`` is an optional argument that accepts a dictionary of prompts with prompt names to prompt texts. The prompt will be prepended to the input text during inference. For example::
model = SentenceTransformer(
"intfloat/multilingual-e5-large",
prompts={
"classification": "Classify the following text: ",
"retrieval": "Retrieve semantically similar text: ",
"clustering": "Identify the topic or theme based on the text: ",
},
)
# or
model.prompts = {
"classification": "Classify the following text: ",
"retrieval": "Retrieve semantically similar text: ",
"clustering": "Identify the topic or theme based on the text: ",
}
- ``default_prompt_name`` is an optional argument that determines the default prompt to be used. It has to correspond with a prompt name from ``prompts``. If ``None``, then no prompt is used by default. For example::
model = SentenceTransformer(
"intfloat/multilingual-e5-large",
prompts={
"classification": "Classify the following text: ",
"retrieval": "Retrieve semantically similar text: ",
"clustering": "Identify the topic or theme based on the text: ",
},
default_prompt_name="retrieval",
)
# or
model.default_prompt_name="retrieval"
Both of these parameters can also be specified in the ``config_sentence_transformers.json`` file of a saved model. That way, you won't have to specify these options manually when loading. When you save a Sentence Transformer model, these options will be automatically saved as well.
During inference, prompts can be applied in a few different ways. All of these scenarios result in identical texts being embedded:
1. Explicitly using the ``prompt`` option in ``SentenceTransformer.encode``::
embeddings = model.encode("How to bake a strawberry cake", prompt="Retrieve semantically similar text: ")
2. Explicitly using the ``prompt_name`` option in ``SentenceTransformer.encode`` by relying on the prompts loaded from a) initialization or b) the model config::
embeddings = model.encode("How to bake a strawberry cake", prompt_name="retrieval")
3. If ``prompt`` nor ``prompt_name`` are specified in ``SentenceTransformer.encode``, then the prompt specified by ``default_prompt_name`` will be applied. If it is ``None``, then no prompt will be applied::
embeddings = model.encode("How to bake a strawberry cake")
Input Sequence Length
---------------------
For transformer models like BERT, RoBERTa, DistilBERT etc., the runtime and memory requirement grows quadratic with the input length. This limits transformers to inputs of certain lengths. A common value for BERT-based models are 512 tokens, which corresponds to about 300-400 words (for English).
Each model has a maximum sequence length under ``model.max_seq_length``, which is the maximal number of tokens that can be processed. Longer texts will be truncated to the first ``model.max_seq_length`` tokens::
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
print("Max Sequence Length:", model.max_seq_length)
# => Max Sequence Length: 256
# Change the length to 200
model.max_seq_length = 200
print("Max Sequence Length:", model.max_seq_length)
# => Max Sequence Length: 200
.. note::
You cannot increase the length higher than what is maximally supported by the respective transformer model. Also note that if a model was trained on short texts, the representations for long texts might not be that good.
Multi-Process / Multi-GPU Encoding
----------------------------------
You can encode input texts with more than one GPU (or with multiple processes on a CPU machine). For an example, see: `computing_embeddings_multi_gpu.py <https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/computing-embeddings/computing_embeddings_multi_gpu.py>`_.
The relevant method is :meth:`~sentence_transformers.SentenceTransformer.start_multi_process_pool`, which starts multiple processes that are used for encoding.
\ No newline at end of file
......@@ -3,10 +3,12 @@ This basic example loads a pre-trained model from the web and uses it to
generate sentence embeddings for a given list of sentences.
"""
from sentence_transformers import SentenceTransformer, LoggingHandler
import numpy as np
import logging
import numpy as np
from sentence_transformers import LoggingHandler, SentenceTransformer
#### Just some code to print debug information to stdout
np.set_printoptions(threshold=100)
......
......@@ -4,9 +4,10 @@ sentences in parallel. This gives a near linear speed-up
when encoding large text collections.
"""
from sentence_transformers import SentenceTransformer, LoggingHandler
import logging
from sentence_transformers import LoggingHandler, SentenceTransformer
logging.basicConfig(
format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO, handlers=[LoggingHandler()]
)
......
......@@ -8,12 +8,14 @@ limit the amount of memory used. More info about dataset streaming:
https://huggingface.co/docs/datasets/stream
"""
from sentence_transformers import SentenceTransformer, LoggingHandler
import logging
from datasets import load_dataset
from torch.utils.data import DataLoader
from tqdm import tqdm
from datasets import load_dataset
from sentence_transformers import LoggingHandler, SentenceTransformer
logging.basicConfig(
format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO, handlers=[LoggingHandler()]
)
......
......@@ -7,13 +7,13 @@ https://www.quora.com/q/quoradata/First-Quora-Dataset-Release-Question-Pairs
Then, we re-rank the hits from the Bi-Encoder using a Cross-Encoder.
"""
from sentence_transformers import SentenceTransformer, util
from sentence_transformers import CrossEncoder
import os
import csv
import os
import pickle
import time
from sentence_transformers import CrossEncoder, SentenceTransformer, util
# We use a BiEncoder (SentenceTransformer) that produces embeddings for questions.
# We then search for similar questions using cosine similarity and identify the top 100 most similar questions
model_name = "all-MiniLM-L6-v2"
......
......@@ -4,9 +4,10 @@ sentences in a corpus using a Cross-Encoder for semantic textual similarity (STS
It output then the most similar sentences for the given query.
"""
from sentence_transformers.cross_encoder import CrossEncoder
import numpy as np
from sentence_transformers.cross_encoder import CrossEncoder
# Pre-trained cross encoder
model = CrossEncoder("cross-encoder/stsb-distilroberta-base")
......
......@@ -20,6 +20,16 @@ Quantizing an embedding with a dimensionality of 1024 to binary would result in
As a result, in practice quantizing a `float32` embedding with a dimensionality of 1024 yields an `int8` or `uint8` embedding with a dimensionality of 128. See two approaches of how you can produce quantized embeddings using Sentence Transformers below:
```eval_rst
.. sidebar:: References
#. `mixedbread-ai/mxbai-embed-large-v1 <https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1>`_
#. :class:`~sentence_transformers.SentenceTransformer`
#. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
#. :func:`~sentence_transformers.quantization.quantize_embeddings`
```
```python
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
......@@ -38,11 +48,6 @@ embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."
binary_embeddings = quantize_embeddings(embeddings, precision="binary")
```
**References:**
* <a href="https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1"><code>mixedbread-ai/mxbai-embed-large-v1</code></a>
* <a href="../../../docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode"><code>SentenceTransformer.encode</code></a>
* <a href="../../../docs/package_reference/quantization.html#sentence_transformers.quantization.quantize_embeddings"><code>quantize_embeddings</code></a>
Here you can see the differences between default `float32` embeddings and binary embeddings in terms of shape, size, and `numpy` dtype:
```python
......@@ -84,6 +89,16 @@ Computing int8 quantization buckets based on 2 embeddings. int8 quantization is
See how you can produce scalar quantized embeddings using Sentence Transformers below:
```eval_rst
.. sidebar:: References
#. `mixedbread-ai/mxbai-embed-large-v1 <https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1>`_
#. :class:`~sentence_transformers.SentenceTransformer`
#. :meth:`SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>`
#. :func:`~sentence_transformers.quantization.quantize_embeddings`
```
```python
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
......@@ -105,11 +120,6 @@ int8_embeddings = quantize_embeddings(
)
```
**References:**
* <a href="https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1"><code>mixedbread-ai/mxbai-embed-large-v1</code></a>
* <a href="../../../docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode"><code>SentenceTransformer.encode</code></a>
* <a href="../../../docs/package_reference/quantization.html#sentence_transformers.quantization.quantize_embeddings"><code>quantize_embeddings</code></a>
Here you can see the differences between default `float32` embeddings and `int8` scalar embeddings in terms of shape, size, and `numpy` dtype:
```python
......@@ -154,6 +164,7 @@ The following demo showcases the retrieval efficiency using `exact` search throu
width="100%"
height="1000"
></iframe>
<p></p> <!-- Force a newline -->
## Try it yourself
......
import time
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings, semantic_search_faiss
from datasets import load_dataset
# 1. Load the quora corpus with questions
dataset = load_dataset("quora", split="train").map(
......
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings, semantic_search_faiss
from datasets import load_dataset
# 1. Load the quora corpus with questions
dataset = load_dataset("quora", split="train").map(
......
......@@ -8,13 +8,14 @@ import json
import os
import time
import faiss
import numpy as np
from usearch.index import Index
from datasets import load_dataset
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings
from datasets import load_dataset
import faiss
from usearch.index import Index
# We use usearch as it can efficiently load int8 vectors from disk.
# Load the model
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment