Unverified Commit ffbcfc01 authored by V.Prasanna kumar's avatar V.Prasanna kumar Committed by GitHub
Browse files

Broken links fixed related to datasets docs (#27569)

fixed the broken links belogs to dataset library of transformers
parent 638d4998
...@@ -227,7 +227,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], ...@@ -227,7 +227,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
对于音频任务,您需要[feature extractor](main_classes/feature_extractor)来准备您的数据集以供模型使用。`feature extractor`旨在从原始音频数据中提取特征,并将它们转换为张量。 对于音频任务,您需要[feature extractor](main_classes/feature_extractor)来准备您的数据集以供模型使用。`feature extractor`旨在从原始音频数据中提取特征,并将它们转换为张量。
加载[MInDS-14](https://huggingface.co/datasets/PolyAI/minds14)数据集(有关如何加载数据集的更多详细信息,请参阅🤗 [Datasets教程](https://huggingface.co/docs/datasets/load_hub.html))以了解如何在音频数据集中使用`feature extractor`: 加载[MInDS-14](https://huggingface.co/datasets/PolyAI/minds14)数据集(有关如何加载数据集的更多详细信息,请参阅🤗 [Datasets教程](https://huggingface.co/docs/datasets/load_hub))以了解如何在音频数据集中使用`feature extractor`:
```py ```py
...@@ -352,7 +352,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], ...@@ -352,7 +352,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
</Tip> </Tip>
加载[food101](https://huggingface.co/datasets/food101)数据集(有关如何加载数据集的更多详细信息,请参阅🤗 [Datasets教程](https://huggingface.co/docs/datasets/load_hub.html))以了解如何在计算机视觉数据集中使用图像处理器: 加载[food101](https://huggingface.co/datasets/food101)数据集(有关如何加载数据集的更多详细信息,请参阅🤗 [Datasets教程](https://huggingface.co/docs/datasets/load_hub))以了解如何在计算机视觉数据集中使用图像处理器:
<Tip> <Tip>
...@@ -367,7 +367,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], ...@@ -367,7 +367,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
>>> dataset = load_dataset("food101", split="train[:100]") >>> dataset = load_dataset("food101", split="train[:100]")
``` ```
接下来,使用🤗 Datasets的[`Image`](https://huggingface.co/docs/datasets/package_reference/main_classes.html?highlight=image#datasets.Image)功能查看图像: 接下来,使用🤗 Datasets的[`Image`](https://huggingface.co/docs/datasets/package_reference/main_classes?highlight=image#datasets.Image)功能查看图像:
```py ```py
...@@ -421,7 +421,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], ...@@ -421,7 +421,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
</Tip> </Tip>
3. 然后使用🤗 Datasets的[`set_transform`](https://huggingface.co/docs/datasets/process.html#format-transform)在运行时应用这些变换: 3. 然后使用🤗 Datasets的[`set_transform`](https://huggingface.co/docs/datasets/process#format-transform)在运行时应用这些变换:
```py ```py
...@@ -476,7 +476,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0], ...@@ -476,7 +476,7 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
对于涉及多模态输入的任务,您需要[processor](main_classes/processors)来为模型准备数据集。`processor`将两个处理对象-例如`tokenizer`和`feature extractor`-组合在一起。 对于涉及多模态输入的任务,您需要[processor](main_classes/processors)来为模型准备数据集。`processor`将两个处理对象-例如`tokenizer`和`feature extractor`-组合在一起。
加载[LJ Speech](https://huggingface.co/datasets/lj_speech)数据集(有关如何加载数据集的更多详细信息,请参阅🤗 [Datasets 教程](https://huggingface.co/docs/datasets/load_hub.html))以了解如何使用`processor`进行自动语音识别(ASR): 加载[LJ Speech](https://huggingface.co/datasets/lj_speech)数据集(有关如何加载数据集的更多详细信息,请参阅🤗 [Datasets 教程](https://huggingface.co/docs/datasets/load_hub))以了解如何使用`processor`进行自动语音识别(ASR):
```py ```py
......
...@@ -43,7 +43,7 @@ rendered properly in your Markdown viewer. ...@@ -43,7 +43,7 @@ rendered properly in your Markdown viewer.
'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. I\'ve worked at more than one location. I expect bad days, bad moods, and the occasional mistake. But I have yet to have a decent experience at this store. It will remain a place I avoid unless someone in my party needs to avoid illness from low blood sugar. Perhaps I should go back to the racially biased service of Steak n Shake instead!'} 'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. I\'ve worked at more than one location. I expect bad days, bad moods, and the occasional mistake. But I have yet to have a decent experience at this store. It will remain a place I avoid unless someone in my party needs to avoid illness from low blood sugar. Perhaps I should go back to the racially biased service of Steak n Shake instead!'}
``` ```
正如您现在所知,您需要一个`tokenizer`来处理文本,包括填充和截断操作以处理可变的序列长度。如果要一次性处理您的数据集,可以使用 🤗 Datasets 的 [`map`](https://huggingface.co/docs/datasets/process.html#map) 方法,将预处理函数应用于整个数据集: 正如您现在所知,您需要一个`tokenizer`来处理文本,包括填充和截断操作以处理可变的序列长度。如果要一次性处理您的数据集,可以使用 🤗 Datasets 的 [`map`](https://huggingface.co/docs/datasets/process#map) 方法,将预处理函数应用于整个数据集:
```py ```py
>>> from transformers import AutoTokenizer >>> from transformers import AutoTokenizer
......
...@@ -10,7 +10,7 @@ way which enables simple and efficient model parallelism. ...@@ -10,7 +10,7 @@ way which enables simple and efficient model parallelism.
`run_image_captioning_flax.py` is a lightweight example of how to download and preprocess a dataset from the 🤗 Datasets `run_image_captioning_flax.py` is a lightweight example of how to download and preprocess a dataset from the 🤗 Datasets
library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it. library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files and you also will find examples of these below. For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets#json-files and you also will find examples of these below.
### Download COCO dataset (2017) ### Download COCO dataset (2017)
This example uses COCO dataset (2017) through a custom dataset script, which requires users to manually download the This example uses COCO dataset (2017) through a custom dataset script, which requires users to manually download the
......
...@@ -494,7 +494,7 @@ def main(): ...@@ -494,7 +494,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
model = FlaxVisionEncoderDecoderModel.from_pretrained( model = FlaxVisionEncoderDecoderModel.from_pretrained(
......
...@@ -589,7 +589,7 @@ def main(): ...@@ -589,7 +589,7 @@ def main():
num_proc=data_args.preprocessing_num_workers, num_proc=data_args.preprocessing_num_workers,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
......
...@@ -484,7 +484,7 @@ def main(): ...@@ -484,7 +484,7 @@ def main():
num_proc=data_args.preprocessing_num_workers, num_proc=data_args.preprocessing_num_workers,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
......
...@@ -516,7 +516,7 @@ def main(): ...@@ -516,7 +516,7 @@ def main():
num_proc=data_args.preprocessing_num_workers, num_proc=data_args.preprocessing_num_workers,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
......
...@@ -630,7 +630,7 @@ def main(): ...@@ -630,7 +630,7 @@ def main():
num_proc=data_args.preprocessing_num_workers, num_proc=data_args.preprocessing_num_workers,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
......
...@@ -536,7 +536,7 @@ def main(): ...@@ -536,7 +536,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# endregion # endregion
# region Load pretrained model and tokenizer # region Load pretrained model and tokenizer
......
...@@ -9,7 +9,7 @@ way which enables simple and efficient model parallelism. ...@@ -9,7 +9,7 @@ way which enables simple and efficient model parallelism.
`run_summarization_flax.py` is a lightweight example of how to download and preprocess a dataset from the 🤗 Datasets library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it. `run_summarization_flax.py` is a lightweight example of how to download and preprocess a dataset from the 🤗 Datasets library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files and you also will find examples of these below. For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets#json-files and you also will find examples of these below.
### Train the model ### Train the model
Next we can run the example script to train the model: Next we can run the example script to train the model:
......
...@@ -521,7 +521,7 @@ def main(): ...@@ -521,7 +521,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
......
...@@ -410,7 +410,7 @@ def main(): ...@@ -410,7 +410,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset at # See more about loading any type of standard or custom dataset at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Labels # Labels
if data_args.task_name is not None: if data_args.task_name is not None:
...@@ -427,7 +427,7 @@ def main(): ...@@ -427,7 +427,7 @@ def main():
num_labels = 1 num_labels = 1
else: else:
# A useful fast method: # A useful fast method:
# https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.unique # https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.unique
label_list = raw_datasets["train"].unique("label") label_list = raw_datasets["train"].unique("label")
label_list.sort() # Let's sort it for determinism label_list.sort() # Let's sort it for determinism
num_labels = len(label_list) num_labels = len(label_list)
......
...@@ -465,7 +465,7 @@ def main(): ...@@ -465,7 +465,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset at # See more about loading any type of standard or custom dataset at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
if raw_datasets["train"] is not None: if raw_datasets["train"] is not None:
column_names = raw_datasets["train"].column_names column_names = raw_datasets["train"].column_names
......
...@@ -340,7 +340,7 @@ def main(): ...@@ -340,7 +340,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# 5. Load pretrained model, tokenizer, and image processor # 5. Load pretrained model, tokenizer, and image processor
if model_args.tokenizer_name: if model_args.tokenizer_name:
......
...@@ -388,7 +388,7 @@ def main(): ...@@ -388,7 +388,7 @@ def main():
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -368,7 +368,7 @@ def main(): ...@@ -368,7 +368,7 @@ def main():
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -382,7 +382,7 @@ def main(): ...@@ -382,7 +382,7 @@ def main():
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -371,7 +371,7 @@ def main(): ...@@ -371,7 +371,7 @@ def main():
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -352,7 +352,7 @@ def main(): ...@@ -352,7 +352,7 @@ def main():
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -329,7 +329,7 @@ def main(): ...@@ -329,7 +329,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment