2022 is the year of multi-modality (#14610)

* 2022 is the year of multi-modality * Small fix * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review * Apply to documentation index * Apply suggestions from code review Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

2022 is the year of multi-modality (#14610)
* 2022 is the year of multi-modality * Small fix * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * Apply suggestions from code review * Apply to documentation index * Apply suggestions from code review Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Update README.md Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
ec47baeb · Lysandre Debut · GitHub · e62091d5 · ec47baeb · ec47baeb
Unverified Commit ec47baeb authored Dec 03, 2021 by Lysandre Debut Committed by GitHub Dec 03, 2021
Hide whitespace changes
Inline Side-by-side

Showing with 59 additions and 34 deletions

README.md README.md +25 -6

docs/source/index.mdx docs/source/index.mdx +34 -28

No files found.
--- a/README.md
+++ b/README.md
@@ -48,14 +48,22 @@ limitations under the License.
 </h4>
 <h3 align="center">
-    <p>State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow</p>
+    <p>State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow</p>
 </h3>
 <h3 align="center">
    <a href="https://hf.co/course"><img src="https://raw.githubusercontent.com/huggingface/transformers/master/docs/source/imgs/course_banner.png"></a>
 </h3>
-🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.
+🤗 Transformers provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio. 
+These models can applied on:
+* 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. 
+* 🖼️ Images, for tasks like image classification, object detection, and segmentation. 
+* 🗣️ Audio, for tasks like speech recognition and audio classification. 
+Transformer models can also perform tasks on **several modalities combined**, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
 🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
@@ -66,6 +74,8 @@ limitations under the License.
 You can test most of our models directly on their pages from the [model hub](https://huggingface.co/models). We also offer [private model hosting, versioning, & an inference API](https://huggingface.co/pricing) for public and private models.
 Here are a few examples:
+ In Natural Language Processing:
 - [Masked word completion with BERT](https://huggingface.co/bert-base-uncased?text=Paris+is+the+%5BMASK%5D+of+France)
 - [Name Entity Recognition with Electra](https://huggingface.co/dbmdz/electra-large-discriminator-finetuned-conll03-english?text=My+name+is+Sarah+and+I+live+in+London+city)
 - [Text generation with GPT-2](https://huggingface.co/gpt2?text=A+long+time+ago%2C+)
@@ -74,6 +84,15 @@ Here are a few examples:
 - [Question answering with DistilBERT](https://huggingface.co/distilbert-base-uncased-distilled-squad?text=Which+name+is+also+used+to+describe+the+Amazon+rainforest+in+English%3F&context=The+Amazon+rainforest+%28Portuguese%3A+Floresta+Amaz%C3%B4nica+or+Amaz%C3%B4nia%3B+Spanish%3A+Selva+Amaz%C3%B3nica%2C+Amazon%C3%ADa+or+usually+Amazonia%3B+French%3A+For%C3%AAt+amazonienne%3B+Dutch%3A+Amazoneregenwoud%29%2C+also+known+in+English+as+Amazonia+or+the+Amazon+Jungle%2C+is+a+moist+broadleaf+forest+that+covers+most+of+the+Amazon+basin+of+South+America.+This+basin+encompasses+7%2C000%2C000+square+kilometres+%282%2C700%2C000+sq+mi%29%2C+of+which+5%2C500%2C000+square+kilometres+%282%2C100%2C000+sq+mi%29+are+covered+by+the+rainforest.+This+region+includes+territory+belonging+to+nine+nations.+The+majority+of+the+forest+is+contained+within+Brazil%2C+with+60%25+of+the+rainforest%2C+followed+by+Peru+with+13%25%2C+Colombia+with+10%25%2C+and+with+minor+amounts+in+Venezuela%2C+Ecuador%2C+Bolivia%2C+Guyana%2C+Suriname+and+French+Guiana.+States+or+departments+in+four+nations+contain+%22Amazonas%22+in+their+names.+The+Amazon+represents+over+half+of+the+planet%27s+remaining+rainforests%2C+and+comprises+the+largest+and+most+biodiverse+tract+of+tropical+rainforest+in+the+world%2C+with+an+estimated+390+billion+individual+trees+divided+into+16%2C000+species)
 - [Translation with T5](https://huggingface.co/t5-base?text=My+name+is+Wolfgang+and+I+live+in+Berlin)
+In Computer Vision:
+- [Image classification with ViT](https://huggingface.co/google/vit-base-patch16-224)
+- [Object Detection with DETR](https://huggingface.co/facebook/detr-resnet-50)
+- [Image Segmentation with DETR](https://huggingface.co/facebook/detr-resnet-50-panoptic)
+In Audio:
+- [Automatic Speech Recognition with Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h)
+- [Keyword Spotting with Wav2Vec2](https://huggingface.co/superb/wav2vec2-base-superb-ks)
 **[Write With Transformer](https://transformer.huggingface.co)**, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.
 ## If you are looking for custom support from the Hugging Face team
@@ -84,7 +103,7 @@ Here are a few examples:
 ## Quick tour
-To immediately use a model on a given text, we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
+To immediately use a model on a given input (text, image, audio, ...), we provide the `pipeline` API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:
 ```python
 >>> from transformers import pipeline
@@ -142,7 +161,7 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
 ## Why should I use transformers?
 1. Easy-to-use state-of-the-art models:
-    - High performance on NLU and NLG tasks.
+    - High performance on natural language understanding & generation, computer vision, and audio tasks.
    - Low barrier to entry for educators and practitioners.
    - Few user-facing abstractions with just three classes to learn.
    - A unified API for using all our pretrained models.
@@ -150,11 +169,11 @@ The model itself is a regular [Pytorch `nn.Module`](https://pytorch.org/docs/sta
 1. Lower compute costs, smaller carbon footprint:
    - Researchers can share trained models instead of always retraining.
    - Practitioners can reduce compute time and production costs.
-    - Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
+    - Dozens of architectures with over 20,000 pretrained models, some in more than 100 languages.
 1. Choose the right framework for every part of a model's lifetime:
    - Train state-of-the-art models in 3 lines of code.
-    - Move a single model between TF2.0/PyTorch frameworks at will.
+    - Move a single model between TF2.0/PyTorch/JAX frameworks at will.
    - Seamlessly pick the right framework for training, evaluation and production.
 1. Easily customize a model or an example to your needs:

--- a/docs/source/index.mdx
+++ b/docs/source/index.mdx
@@ -12,12 +12,21 @@ specific language governing permissions and limitations under the License.
 # 🤗 Transformers
-State-of-the-art Natural Language Processing for Jax, Pytorch and TensorFlow
+State-of-the-art Machine Learning for Jax, Pytorch and TensorFlow
-🤗 Transformers (formerly known as _pytorch-transformers_ and _pytorch-pretrained-bert_) provides general-purpose
+🤗 Transformers (formerly known as _pytorch-transformers_ and _pytorch-pretrained-bert_) provides thousands of pretrained models to perform tasks on different modalities such as text, vision, and audio.
-architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
-Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between Jax,
+These models can applied on:
-PyTorch and TensorFlow.
+* 📝 Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages.
+* 🖼️ Images, for tasks like image classification, object detection, and segmentation.
+* 🗣️ Audio, for tasks like speech recognition and audio classification.
+Transformer models can also perform tasks on **several modalities combined**, such as table question answering, optical character recognition, information extraction from scanned documents, video classification, and visual question answering.
+🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our [model hub](https://huggingface.co/models). At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.
+🤗 Transformers is backed by the three most popular deep learning libraries — [Jax](https://jax.readthedocs.io/en/latest/), [PyTorch](https://pytorch.org/) and [TensorFlow](https://www.tensorflow.org/) — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.
 This is the documentation of our repository [transformers](https://github.com/huggingface/transformers). You can
 also follow our [online course](https://huggingface.co/course) that teaches how to use this library, as well as the
@@ -31,29 +40,26 @@ other libraries developed by Hugging Face and the Hub.
 ## Features
- High performance on NLU and NLG tasks
+1. Easy-to-use state-of-the-art models:
- Low barrier to entry for educators and practitioners
+    - High performance on natural language understanding & generation, computer vision, and audio tasks.
+    - Low barrier to entry for educators and practitioners.
-State-of-the-art NLP for everyone:
+    - Few user-facing abstractions with just three classes to learn.
+    - A unified API for using all our pretrained models.
- Deep learning researchers
- Hands-on practitioners
+1. Lower compute costs, smaller carbon footprint:
- AI/ML/NLP teachers and educators
+    - Researchers can share trained models instead of always retraining.
+    - Practitioners can reduce compute time and production costs.
-Lower compute costs, smaller carbon footprint:
+    - Dozens of architectures with over 20,000 pretrained models, some in more than 100 languages.
- Researchers can share trained models instead of always retraining
+1. Choose the right framework for every part of a model's lifetime:
- Practitioners can reduce compute time and production costs
+    - Train state-of-the-art models in 3 lines of code.
- 8 architectures with over 30 pretrained models, some in more than 100 languages
+    - Move a single model between TF2.0/PyTorch/JAX frameworks at will.
+    - Seamlessly pick the right framework for training, evaluation and production.
-Choose the right framework for every part of a model's lifetime:
+1. Easily customize a model or an example to your needs:
- Train state-of-the-art models in 3 lines of code
+    - We provide examples for each architecture to reproduce the results published by its original authors.
- Deep interoperability between Jax, Pytorch and TensorFlow models
+    - Model internals are exposed as consistently as possible.
- Move a single model between Jax/PyTorch/TensorFlow frameworks at will
+    - Model files can be used independently of the library for quick experiments.
- Seamlessly pick the right framework for training, evaluation, production
-The support for Jax is still experimental (with a few models right now), expect to see it grow in the coming months!
 [All the model checkpoints](https://huggingface.co/models) are seamlessly integrated from the huggingface.co [model
 hub](https://huggingface.co) where they are uploaded directly by [users](https://huggingface.co/users) and