Unverified Commit ffbcfc01 authored by V.Prasanna kumar's avatar V.Prasanna kumar Committed by GitHub
Browse files

Broken links fixed related to datasets docs (#27569)

fixed the broken links belogs to dataset library of transformers
parent 638d4998
...@@ -366,7 +366,7 @@ def main(): ...@@ -366,7 +366,7 @@ def main():
for split in raw_datasets.keys(): for split in raw_datasets.keys():
raw_datasets[split] = raw_datasets[split].select(range(100)) raw_datasets[split] = raw_datasets[split].select(range(100))
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
if raw_datasets["train"] is not None: if raw_datasets["train"] is not None:
column_names = raw_datasets["train"].column_names column_names = raw_datasets["train"].column_names
......
...@@ -337,7 +337,7 @@ def main(): ...@@ -337,7 +337,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -325,7 +325,7 @@ def main(): ...@@ -325,7 +325,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -369,7 +369,7 @@ def main(): ...@@ -369,7 +369,7 @@ def main():
extension = args.train_file.split(".")[-1] extension = args.train_file.split(".")[-1]
raw_datasets = load_dataset(extension, data_files=data_files, field="data") raw_datasets = load_dataset(extension, data_files=data_files, field="data")
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -417,7 +417,7 @@ def main(): ...@@ -417,7 +417,7 @@ def main():
extension = args.train_file.split(".")[-1] extension = args.train_file.split(".")[-1]
raw_datasets = load_dataset(extension, data_files=data_files, field="data") raw_datasets = load_dataset(extension, data_files=data_files, field="data")
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -382,7 +382,7 @@ def main(): ...@@ -382,7 +382,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -134,7 +134,7 @@ of **0.36**. ...@@ -134,7 +134,7 @@ of **0.36**.
### Multi GPU CTC with Dataset Streaming ### Multi GPU CTC with Dataset Streaming
The following command shows how to use [Dataset Streaming mode](https://huggingface.co/docs/datasets/dataset_streaming.html) The following command shows how to use [Dataset Streaming mode](https://huggingface.co/docs/datasets/dataset_streaming)
to fine-tune [XLS-R](https://huggingface.co/transformers/main/model_doc/xls_r.html) to fine-tune [XLS-R](https://huggingface.co/transformers/main/model_doc/xls_r.html)
on [Common Voice](https://huggingface.co/datasets/common_voice) using 4 GPUs in half-precision. on [Common Voice](https://huggingface.co/datasets/common_voice) using 4 GPUs in half-precision.
......
...@@ -33,7 +33,7 @@ For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2s ...@@ -33,7 +33,7 @@ For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2s
`run_summarization.py` is a lightweight example of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it. `run_summarization.py` is a lightweight example of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets#json-files
and you also will find examples of these below. and you also will find examples of these below.
## With Trainer ## With Trainer
......
...@@ -432,7 +432,7 @@ def main(): ...@@ -432,7 +432,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -409,7 +409,7 @@ def main(): ...@@ -409,7 +409,7 @@ def main():
extension = args.train_file.split(".")[-1] extension = args.train_file.split(".")[-1]
raw_datasets = load_dataset(extension, data_files=data_files) raw_datasets = load_dataset(extension, data_files=data_files)
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -396,7 +396,7 @@ def main(): ...@@ -396,7 +396,7 @@ def main():
) )
# See more about loading any type of standard or custom dataset at # See more about loading any type of standard or custom dataset at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
if data_args.remove_splits is not None: if data_args.remove_splits is not None:
for split in data_args.remove_splits.split(","): for split in data_args.remove_splits.split(","):
......
...@@ -355,7 +355,7 @@ def main(): ...@@ -355,7 +355,7 @@ def main():
token=model_args.token, token=model_args.token,
) )
# See more about loading any type of standard or custom dataset at # See more about loading any type of standard or custom dataset at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Labels # Labels
if data_args.task_name is not None: if data_args.task_name is not None:
...@@ -372,7 +372,7 @@ def main(): ...@@ -372,7 +372,7 @@ def main():
num_labels = 1 num_labels = 1
else: else:
# A useful fast method: # A useful fast method:
# https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.unique # https://huggingface.co/docs/datasets/package_reference/main_classes#datasets.Dataset.unique
label_list = raw_datasets["train"].unique("label") label_list = raw_datasets["train"].unique("label")
label_list.sort() # Let's sort it for determinism label_list.sort() # Let's sort it for determinism
num_labels = len(label_list) num_labels = len(label_list)
......
...@@ -293,7 +293,7 @@ def main(): ...@@ -293,7 +293,7 @@ def main():
extension = (args.train_file if args.train_file is not None else args.validation_file).split(".")[-1] extension = (args.train_file if args.train_file is not None else args.validation_file).split(".")[-1]
raw_datasets = load_dataset(extension, data_files=data_files) raw_datasets = load_dataset(extension, data_files=data_files)
# See more about loading any type of standard or custom dataset at # See more about loading any type of standard or custom dataset at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Labels # Labels
if args.task_name is not None: if args.task_name is not None:
......
...@@ -318,7 +318,7 @@ def main(): ...@@ -318,7 +318,7 @@ def main():
extension = data_args.train_file.split(".")[-1] extension = data_args.train_file.split(".")[-1]
raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir)
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
if training_args.do_train: if training_args.do_train:
column_names = raw_datasets["train"].column_names column_names = raw_datasets["train"].column_names
......
...@@ -348,7 +348,7 @@ def main(): ...@@ -348,7 +348,7 @@ def main():
for split in raw_datasets.keys(): for split in raw_datasets.keys():
raw_datasets[split] = raw_datasets[split].select(range(100)) raw_datasets[split] = raw_datasets[split].select(range(100))
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
if raw_datasets["train"] is not None: if raw_datasets["train"] is not None:
column_names = raw_datasets["train"].column_names column_names = raw_datasets["train"].column_names
......
...@@ -33,7 +33,7 @@ For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2s ...@@ -33,7 +33,7 @@ For the old `finetune_trainer.py` and related utils, see [`examples/legacy/seq2s
`run_translation.py` is a lightweight examples of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it. `run_translation.py` is a lightweight examples of how to download and preprocess a dataset from the [🤗 Datasets](https://github.com/huggingface/datasets) library or use your own files (jsonlines or csv), then fine-tune one of the architectures above on it.
For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets.html#json-files For custom datasets in `jsonlines` format please see: https://huggingface.co/docs/datasets/loading_datasets#json-files
and you also will find examples of these below. and you also will find examples of these below.
......
...@@ -389,7 +389,7 @@ def main(): ...@@ -389,7 +389,7 @@ def main():
extension = args.train_file.split(".")[-1] extension = args.train_file.split(".")[-1]
raw_datasets = load_dataset(extension, data_files=data_files) raw_datasets = load_dataset(extension, data_files=data_files)
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained model and tokenizer # Load pretrained model and tokenizer
# #
......
...@@ -227,7 +227,7 @@ the forum and making use of the [🤗 hub](http://huggingface.co/) to have a ver ...@@ -227,7 +227,7 @@ the forum and making use of the [🤗 hub](http://huggingface.co/) to have a ver
control for your models and training logs. control for your models and training logs.
- When debugging, it is important that the debugging cycle is kept as short as possible to - When debugging, it is important that the debugging cycle is kept as short as possible to
be able to effectively debug. *E.g.* if there is a problem with your training script, be able to effectively debug. *E.g.* if there is a problem with your training script,
you should run it with just a couple of hundreds of examples and not the whole dataset script. This can be done by either making use of [datasets streaming](https://huggingface.co/docs/datasets/master/dataset_streaming.html?highlight=streaming) or by selecting just the first you should run it with just a couple of hundreds of examples and not the whole dataset script. This can be done by either making use of [datasets streaming](https://huggingface.co/docs/datasets/master/dataset_streaming?highlight=streaming) or by selecting just the first
X number of data samples after loading: X number of data samples after loading:
```python ```python
......
...@@ -23,7 +23,7 @@ JAX/Flax allows you to trace pure functions and compile them into efficient, fus ...@@ -23,7 +23,7 @@ JAX/Flax allows you to trace pure functions and compile them into efficient, fus
Models written in JAX/Flax are **immutable** and updated in a purely functional Models written in JAX/Flax are **immutable** and updated in a purely functional
way which enables simple and efficient model parallelism. way which enables simple and efficient model parallelism.
All of the following examples make use of [dataset streaming](https://huggingface.co/docs/datasets/master/dataset_streaming.html), therefore allowing to train models on massive datasets\ All of the following examples make use of [dataset streaming](https://huggingface.co/docs/datasets/master/dataset_streaming), therefore allowing to train models on massive datasets\
without ever having to download the full dataset. without ever having to download the full dataset.
## Masked language modeling ## Masked language modeling
......
...@@ -304,7 +304,7 @@ def main(): ...@@ -304,7 +304,7 @@ def main():
extension = "text" extension = "text"
dataset = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir) dataset = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir)
# See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at # See more about loading any type of standard or custom dataset (from files, python dict, pandas DataFrame, etc) at
# https://huggingface.co/docs/datasets/loading_datasets.html. # https://huggingface.co/docs/datasets/loading_datasets.
# Load pretrained config and tokenizer # Load pretrained config and tokenizer
if model_args.config_name: if model_args.config_name:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment