Unverified Commit 6ab7d1a4 authored by Hemil Desai's avatar Hemil Desai Committed by GitHub
Browse files

Add Readme for language modeling scripts with accelerate (#11073)

parent 2199608c
...@@ -22,8 +22,7 @@ ALBERT, BERT, DistilBERT, RoBERTa, XLNet... GPT and GPT-2 are trained or fine-tu ...@@ -22,8 +22,7 @@ ALBERT, BERT, DistilBERT, RoBERTa, XLNet... GPT and GPT-2 are trained or fine-tu
loss. XLNet uses permutation language modeling (PLM), you can find more information about the differences between those loss. XLNet uses permutation language modeling (PLM), you can find more information about the differences between those
objectives in our [model summary](https://huggingface.co/transformers/model_summary.html). objectives in our [model summary](https://huggingface.co/transformers/model_summary.html).
These scripts leverage the 🤗 Datasets library and the Trainer API. You can easily customize them to your needs if you There are two sets of scripts provided. The first set leverages the Trainer API. The second set with `no_trainer` in the suffix uses a custom training loop and leverages the 🤗 Accelerate library . Both sets use the 🤗 Datasets library. You can easily customize them to your needs if you need extra processing on your datasets.
need extra processing on your datasets.
**Note:** The old script `run_language_modeling.py` is still available [here](https://github.com/huggingface/transformers/blob/master/examples/legacy/run_language_modeling.py). **Note:** The old script `run_language_modeling.py` is still available [here](https://github.com/huggingface/transformers/blob/master/examples/legacy/run_language_modeling.py).
...@@ -60,6 +59,15 @@ python run_clm.py \ ...@@ -60,6 +59,15 @@ python run_clm.py \
--output_dir /tmp/test-clm --output_dir /tmp/test-clm
``` ```
This uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_clm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:
```bash
python run_clm_no_trainer.py \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--model_name_or_path gpt2 \
--output_dir /tmp/test-clm
```
### RoBERTa/BERT/DistilBERT and masked language modeling ### RoBERTa/BERT/DistilBERT and masked language modeling
...@@ -95,23 +103,33 @@ python run_mlm.py \ ...@@ -95,23 +103,33 @@ python run_mlm.py \
If your dataset is organized with one sample per line, you can use the `--line_by_line` flag (otherwise the script If your dataset is organized with one sample per line, you can use the `--line_by_line` flag (otherwise the script
concatenates all texts and then splits them in blocks of the same length). concatenates all texts and then splits them in blocks of the same length).
This uses the built in HuggingFace `Trainer` for training. If you want to use a custom training loop, you can utilize or adapt the `run_mlm_no_trainer.py` script. Take a look at the script for a list of supported arguments. An example is shown below:
```bash
python run_mlm_no_trainer.py \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--model_name_or_path roberta-base \
--output_dir /tmp/test-mlm
```
**Note:** On TPU, you should use the flag `--pad_to_max_length` in conjunction with the `--line_by_line` flag to make **Note:** On TPU, you should use the flag `--pad_to_max_length` in conjunction with the `--line_by_line` flag to make
sure all your batches have the same length. sure all your batches have the same length.
### Whole word masking ### Whole word masking
This part was moved to `examples/research_projects/mlm_wwm`. This part was moved to `examples/research_projects/mlm_wwm`.
### XLNet and permutation language modeling ### XLNet and permutation language modeling
XLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method XLNet uses a different training objective, which is permutation language modeling. It is an autoregressive method
to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input to learn bidirectional contexts by maximizing the expected likelihood over all permutations of the input
sequence factorization order. sequence factorization order.
We use the `--plm_probability` flag to define the ratio of length of a span of masked tokens to surrounding We use the `--plm_probability` flag to define the ratio of length of a span of masked tokens to surrounding
context length for permutation language modeling. context length for permutation language modeling.
The `--max_span_length` flag may also be used to limit the length of a span of masked tokens used The `--max_span_length` flag may also be used to limit the length of a span of masked tokens used
for permutation language modeling. for permutation language modeling.
Here is how to fine-tune XLNet on wikitext-2: Here is how to fine-tune XLNet on wikitext-2:
......
...@@ -56,7 +56,7 @@ MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES) ...@@ -56,7 +56,7 @@ MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)
def parse_args(): def parse_args():
parser = argparse.ArgumentParser(description="Finetune a transformers model on a text classification task") parser = argparse.ArgumentParser(description="Finetune a transformers model on a Masked Language Modeling task")
parser.add_argument( parser.add_argument(
"--dataset_name", "--dataset_name",
type=str, type=str,
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment