Unverified Commit 4b2b50aa authored by Tomy Hsieh's avatar Tomy Hsieh Committed by GitHub
Browse files

Rename NLP library to Datasets library (#10920)

* Rename NLP library to Datasets library

* Update github template

* Fix styling
parent 86c6f8a8
...@@ -54,7 +54,7 @@ Model hub: ...@@ -54,7 +54,7 @@ Model hub:
HF projects: HF projects:
- nlp datasets: [different repo](https://github.com/huggingface/nlp) - datasets: [different repo](https://github.com/huggingface/datasets)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers) - rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Examples: Examples:
......
...@@ -62,7 +62,7 @@ Documentation: @sgugger ...@@ -62,7 +62,7 @@ Documentation: @sgugger
HF projects: HF projects:
- nlp datasets: [different repo](https://github.com/huggingface/nlp) - datasets: [different repo](https://github.com/huggingface/datasets)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers) - rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Examples: Examples:
......
...@@ -15,10 +15,10 @@ Fine-tuning with custom datasets ...@@ -15,10 +15,10 @@ Fine-tuning with custom datasets
.. note:: .. note::
The datasets used in this tutorial are available and can be more easily accessed using the `🤗 NLP library The datasets used in this tutorial are available and can be more easily accessed using the `🤗 Datasets library
<https://github.com/huggingface/nlp>`_. We do not use this library to access the datasets here since this tutorial <https://github.com/huggingface/datasets>`_. We do not use this library to access the datasets here since this
meant to illustrate how to work with your own data. A brief of introduction can be found at the end of the tutorial tutorial meant to illustrate how to work with your own data. A brief of introduction can be found at the end of the
in the section ":ref:`nlplib`". tutorial in the section ":ref:`datasetslib`".
This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets. The guide This tutorial will take you through several examples of using 🤗 Transformers models with your own datasets. The guide
shows one of many valid workflows for using these models and is meant to be illustrative rather than definitive. We shows one of many valid workflows for using these models and is meant to be illustrative rather than definitive. We
...@@ -41,7 +41,7 @@ Sequence Classification with IMDb Reviews ...@@ -41,7 +41,7 @@ Sequence Classification with IMDb Reviews
.. note:: .. note::
This dataset can be explored in the Hugging Face model hub (`IMDb <https://huggingface.co/datasets/imdb>`_), and This dataset can be explored in the Hugging Face model hub (`IMDb <https://huggingface.co/datasets/imdb>`_), and
can be alternatively downloaded with the 🤗 NLP library with ``load_dataset("imdb")``. can be alternatively downloaded with the 🤗 Datasets library with ``load_dataset("imdb")``.
In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task takes In this example, we'll show how to download, tokenize, and train a model on the IMDb reviews dataset. This task takes
the text of a review and requires the model to predict whether the sentiment of the review is positive or negative. the text of a review and requires the model to predict whether the sentiment of the review is positive or negative.
...@@ -260,7 +260,7 @@ Token Classification with W-NUT Emerging Entities ...@@ -260,7 +260,7 @@ Token Classification with W-NUT Emerging Entities
.. note:: .. note::
This dataset can be explored in the Hugging Face model hub (`WNUT-17 <https://huggingface.co/datasets/wnut_17>`_), This dataset can be explored in the Hugging Face model hub (`WNUT-17 <https://huggingface.co/datasets/wnut_17>`_),
and can be alternatively downloaded with the 🤗 NLP library with ``load_dataset("wnut_17")``. and can be alternatively downloaded with the 🤗 Datasets library with ``load_dataset("wnut_17")``.
Next we will look at token classification. Rather than classifying an entire sequence, this task classifies token by Next we will look at token classification. Rather than classifying an entire sequence, this task classifies token by
token. We'll demonstrate how to do this with `Named Entity Recognition token. We'll demonstrate how to do this with `Named Entity Recognition
...@@ -459,7 +459,7 @@ Question Answering with SQuAD 2.0 ...@@ -459,7 +459,7 @@ Question Answering with SQuAD 2.0
.. note:: .. note::
This dataset can be explored in the Hugging Face model hub (`SQuAD V2 This dataset can be explored in the Hugging Face model hub (`SQuAD V2
<https://huggingface.co/datasets/squad_v2>`_), and can be alternatively downloaded with the 🤗 NLP library with <https://huggingface.co/datasets/squad_v2>`_), and can be alternatively downloaded with the 🤗 Datasets library with
``load_dataset("squad_v2")``. ``load_dataset("squad_v2")``.
Question answering comes in many forms. In this example, we'll look at the particular type of extractive QA that Question answering comes in many forms. In this example, we'll look at the particular type of extractive QA that
...@@ -677,22 +677,23 @@ Additional Resources ...@@ -677,22 +677,23 @@ Additional Resources
- :doc:`Preprocessing <preprocessing>`. Docs page on data preprocessing. - :doc:`Preprocessing <preprocessing>`. Docs page on data preprocessing.
- :doc:`Training <training>`. Docs page on training and fine-tuning. - :doc:`Training <training>`. Docs page on training and fine-tuning.
.. _nlplib: .. _datasetslib:
Using the 🤗 NLP Datasets & Metrics library Using the 🤗 Datasets & Metrics library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This tutorial demonstrates how to read in datasets from various raw text formats and prepare them for training with 🤗 This tutorial demonstrates how to read in datasets from various raw text formats and prepare them for training with 🤗
Transformers so that you can do the same thing with your own custom datasets. However, we recommend users use the `🤗 Transformers so that you can do the same thing with your own custom datasets. However, we recommend users use the `🤗
NLP library <https://github.com/huggingface/nlp>`_ for working with the 150+ datasets included in the `hub Datasets library <https://github.com/huggingface/datasets>`_ for working with the 150+ datasets included in the `hub
<https://huggingface.co/datasets>`_, including the three datasets used in this tutorial. As a very brief overview, we <https://huggingface.co/datasets>`_, including the three datasets used in this tutorial. As a very brief overview, we
will show how to use the NLP library to download and prepare the IMDb dataset from the first example, :ref:`seq_imdb`. will show how to use the Datasets library to download and prepare the IMDb dataset from the first example,
:ref:`seq_imdb`.
Start by downloading the dataset: Start by downloading the dataset:
.. code-block:: python .. code-block:: python
from nlp import load_dataset from datasets import load_dataset
train = load_dataset("imdb", split="train") train = load_dataset("imdb", split="train")
Each dataset has multiple columns corresponding to different features. Let's see what our columns are. Each dataset has multiple columns corresponding to different features. Let's see what our columns are.
...@@ -724,5 +725,5 @@ dataset elements. ...@@ -724,5 +725,5 @@ dataset elements.
>>> {key: val.shape for key, val in train[0].items()}) >>> {key: val.shape for key, val in train[0].items()})
{'labels': TensorShape([]), 'input_ids': TensorShape([512]), 'attention_mask': TensorShape([512])} {'labels': TensorShape([]), 'input_ids': TensorShape([512]), 'attention_mask': TensorShape([512])}
We now have a fully-prepared dataset. Check out `the 🤗 NLP docs <https://huggingface.co/nlp/processing.html>`_ for a We now have a fully-prepared dataset. Check out `the 🤗 Datasets docs
more thorough introduction. <https://huggingface.co/docs/datasets/processing.html>`_ for a more thorough introduction.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment