Unverified Commit b5e2b183 authored by Sylvain Gugger's avatar Sylvain Gugger Committed by GitHub
Browse files

Doc styler examples (#14953)

* Fix bad examples

* Add black formatting to style_doc

* Use first nonempty line

* Put it at the right place

* Don't add spaces to empty lines

* Better templates

* Deal with triple quotes in docstrings

* Result of style_doc

* Enable mdx treatment and fix code examples in MDXs

* Result of doc styler on doc source files

* Last fixes

* Break copy from
parent e13f72fb
...@@ -267,7 +267,7 @@ single forward pass using a dummy integer vector of input IDs as an input. Such ...@@ -267,7 +267,7 @@ single forward pass using a dummy integer vector of input IDs as an input. Such
pseudocode): pseudocode):
```python ```python
model = BrandNewBertModel.load_pretrained_checkpoint(/path/to/checkpoint/) model = BrandNewBertModel.load_pretrained_checkpoint("/path/to/checkpoint/")
input_ids = [0, 4, 5, 2, 3, 7, 9] # vector of input ids input_ids = [0, 4, 5, 2, 3, 7, 9] # vector of input ids
original_output = model.predict(input_ids) original_output = model.predict(input_ids)
``` ```
...@@ -476,6 +476,7 @@ following command should work: ...@@ -476,6 +476,7 @@ following command should work:
```python ```python
from transformers import BrandNewBertModel, BrandNewBertConfig from transformers import BrandNewBertModel, BrandNewBertConfig
model = BrandNewBertModel(BrandNewBertConfig()) model = BrandNewBertModel(BrandNewBertConfig())
``` ```
...@@ -502,12 +503,13 @@ PyTorch, called `SimpleModel` as follows: ...@@ -502,12 +503,13 @@ PyTorch, called `SimpleModel` as follows:
```python ```python
from torch import nn from torch import nn
class SimpleModel(nn.Module): class SimpleModel(nn.Module):
def __init__(self): def __init__(self):
super().__init__() super().__init__()
self.dense = nn.Linear(10, 10) self.dense = nn.Linear(10, 10)
self.intermediate = nn.Linear(10, 10) self.intermediate = nn.Linear(10, 10)
self.layer_norm = nn.LayerNorm(10) self.layer_norm = nn.LayerNorm(10)
``` ```
Now we can create an instance of this model definition which will fill all weights: `dense`, `intermediate`, Now we can create an instance of this model definition which will fill all weights: `dense`, `intermediate`,
...@@ -565,7 +567,7 @@ In the conversion script, you should fill those randomly initialized weights wit ...@@ -565,7 +567,7 @@ In the conversion script, you should fill those randomly initialized weights wit
corresponding layer in the checkpoint. *E.g.* corresponding layer in the checkpoint. *E.g.*
```python ```python
# retrieve matching layer weights, e.g. by # retrieve matching layer weights, e.g. by
# recursive algorithm # recursive algorithm
layer_name = "dense" layer_name = "dense"
pretrained_weight = array_of_dense_layer pretrained_weight = array_of_dense_layer
...@@ -622,7 +624,7 @@ pass of the model using the original repository. Now you should write an analogo ...@@ -622,7 +624,7 @@ pass of the model using the original repository. Now you should write an analogo
implementation instead of the original one. It should look as follows: implementation instead of the original one. It should look as follows:
```python ```python
model = BrandNewBertModel.from_pretrained(/path/to/converted/checkpoint/folder) model = BrandNewBertModel.from_pretrained("/path/to/converted/checkpoint/folder")
input_ids = [0, 4, 4, 3, 2, 4, 1, 7, 19] input_ids = [0, 4, 4, 3, 2, 4, 1, 7, 19]
output = model(input_ids).last_hidden_states output = model(input_ids).last_hidden_states
``` ```
...@@ -668,7 +670,7 @@ fully comply with the required design. To make sure, the implementation is fully ...@@ -668,7 +670,7 @@ fully comply with the required design. To make sure, the implementation is fully
common tests should pass. The Cookiecutter should have automatically added a test file for your model, probably under common tests should pass. The Cookiecutter should have automatically added a test file for your model, probably under
the same `tests/test_modeling_brand_new_bert.py`. Run this test file to verify that all common tests pass: the same `tests/test_modeling_brand_new_bert.py`. Run this test file to verify that all common tests pass:
```python ```bash
pytest tests/test_modeling_brand_new_bert.py pytest tests/test_modeling_brand_new_bert.py
``` ```
...@@ -714,7 +716,7 @@ that inputs a string and returns the `input_ids``. It could look similar to this ...@@ -714,7 +716,7 @@ that inputs a string and returns the `input_ids``. It could look similar to this
```python ```python
input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words." input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words."
model = BrandNewBertModel.load_pretrained_checkpoint(/path/to/checkpoint/) model = BrandNewBertModel.load_pretrained_checkpoint("/path/to/checkpoint/")
input_ids = model.tokenize(input_str) input_ids = model.tokenize(input_str)
``` ```
...@@ -725,9 +727,10 @@ created. It should look similar to this: ...@@ -725,9 +727,10 @@ created. It should look similar to this:
```python ```python
from transformers import BrandNewBertTokenizer from transformers import BrandNewBertTokenizer
input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words." input_str = "This is a long example input string containing special characters .$?-, numbers 2872 234 12 and words."
tokenizer = BrandNewBertTokenizer.from_pretrained(/path/to/tokenizer/folder/) tokenizer = BrandNewBertTokenizer.from_pretrained("/path/to/tokenizer/folder/")
input_ids = tokenizer(input_str).input_ids input_ids = tokenizer(input_str).input_ids
``` ```
......
...@@ -26,6 +26,7 @@ Start by inheriting the base class `Pipeline`. with the 4 methods needed to impl ...@@ -26,6 +26,7 @@ Start by inheriting the base class `Pipeline`. with the 4 methods needed to impl
```python ```python
from transformers import Pipeline from transformers import Pipeline
class MyPipeline(Pipeline): class MyPipeline(Pipeline):
def _sanitize_parameters(self, **kwargs): def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {} preprocess_kwargs = {}
...@@ -34,7 +35,7 @@ class MyPipeline(Pipeline): ...@@ -34,7 +35,7 @@ class MyPipeline(Pipeline):
return preprocess_kwargs, {}, {} return preprocess_kwargs, {}, {}
def preprocess(self, inputs, maybe_arg=2): def preprocess(self, inputs, maybe_arg=2):
model_input = Tensor(....) model_input = Tensor(inputs["input_ids"])
return {"model_input": model_input} return {"model_input": model_input}
def _forward(self, model_inputs): def _forward(self, model_inputs):
...@@ -90,6 +91,7 @@ def postprocess(self, model_outputs, top_k=5): ...@@ -90,6 +91,7 @@ def postprocess(self, model_outputs, top_k=5):
# Add logic to handle top_k # Add logic to handle top_k
return best_class return best_class
def _sanitize_parameters(self, **kwargs): def _sanitize_parameters(self, **kwargs):
preprocess_kwargs = {} preprocess_kwargs = {}
if "maybe_arg" in kwargs: if "maybe_arg" in kwargs:
......
...@@ -37,11 +37,12 @@ The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an ...@@ -37,11 +37,12 @@ The benchmark classes [`PyTorchBenchmark`] and [`TensorFlowBenchmark`] expect an
>>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = PyTorchBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])
>>> benchmark = PyTorchBenchmark(args) >>> benchmark = PyTorchBenchmark(args)
===PT-TF-SPLIT=== ===PT-TF-SPLIT===
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments
>>> args = TensorFlowBenchmarkArguments(models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = TensorFlowBenchmarkArguments(
... models=["bert-base-uncased"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
... )
>>> benchmark = TensorFlowBenchmark(args) >>> benchmark = TensorFlowBenchmark(args)
``` ```
...@@ -174,7 +175,9 @@ configurations must be inserted with the benchmark args as follows. ...@@ -174,7 +175,9 @@ configurations must be inserted with the benchmark args as follows.
```py ```py
>>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig >>> from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig
>>> args = PyTorchBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = PyTorchBenchmarkArguments(
... models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
... )
>>> config_base = BertConfig() >>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384) >>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6) >>> config_6_lay = BertConfig(num_hidden_layers=6)
...@@ -244,7 +247,9 @@ bert-6-lay 8 512 1359 ...@@ -244,7 +247,9 @@ bert-6-lay 8 512 1359
===PT-TF-SPLIT=== ===PT-TF-SPLIT===
>>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig >>> from transformers import TensorFlowBenchmark, TensorFlowBenchmarkArguments, BertConfig
>>> args = TensorFlowBenchmarkArguments(models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]) >>> args = TensorFlowBenchmarkArguments(
... models=["bert-base", "bert-384-hid", "bert-6-lay"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]
... )
>>> config_base = BertConfig() >>> config_base = BertConfig()
>>> config_384_hid = BertConfig(hidden_size=384) >>> config_384_hid = BertConfig(hidden_size=384)
>>> config_6_lay = BertConfig(num_hidden_layers=6) >>> config_6_lay = BertConfig(num_hidden_layers=6)
......
...@@ -54,6 +54,7 @@ The 🤗 Datasets library makes it simple to load a dataset: ...@@ -54,6 +54,7 @@ The 🤗 Datasets library makes it simple to load a dataset:
```python ```python
from datasets import load_dataset from datasets import load_dataset
imdb = load_dataset("imdb") imdb = load_dataset("imdb")
``` ```
...@@ -61,8 +62,9 @@ This loads a `DatasetDict` object which you can index into to view an example: ...@@ -61,8 +62,9 @@ This loads a `DatasetDict` object which you can index into to view an example:
```python ```python
imdb["train"][0] imdb["train"][0]
{'label': 1, {
'text': 'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as "Teachers". My 35 years in the teaching profession lead me to believe that Bromwell High\'s satire is much closer to reality than is "Teachers". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\'t!' "label": 1,
"text": "Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as \"Teachers\". My 35 years in the teaching profession lead me to believe that Bromwell High's satire is much closer to reality than is \"Teachers\". The scramble to survive financially, the insightful students who can see right through their pathetic teachers' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I'm here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn't!",
} }
``` ```
...@@ -74,6 +76,7 @@ model was trained with to ensure appropriately tokenized words. Load the DistilB ...@@ -74,6 +76,7 @@ model was trained with to ensure appropriately tokenized words. Load the DistilB
```python ```python
from transformers import AutoTokenizer from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
``` ```
...@@ -99,6 +102,7 @@ batch. This is known as **dynamic padding**. You can do this with the `DataColla ...@@ -99,6 +102,7 @@ batch. This is known as **dynamic padding**. You can do this with the `DataColla
```python ```python
from transformers import DataCollatorWithPadding from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer) data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
``` ```
...@@ -108,6 +112,7 @@ Now load your model with the [`AutoModelForSequenceClassification`] class along ...@@ -108,6 +112,7 @@ Now load your model with the [`AutoModelForSequenceClassification`] class along
```python ```python
from transformers import AutoModelForSequenceClassification from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2) model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
``` ```
...@@ -121,7 +126,7 @@ At this point, only three steps remain: ...@@ -121,7 +126,7 @@ At this point, only three steps remain:
from transformers import TrainingArguments, Trainer from transformers import TrainingArguments, Trainer
training_args = TrainingArguments( training_args = TrainingArguments(
output_dir='./results', output_dir="./results",
learning_rate=2e-5, learning_rate=2e-5,
per_device_train_batch_size=16, per_device_train_batch_size=16,
per_device_eval_batch_size=16, per_device_eval_batch_size=16,
...@@ -150,6 +155,7 @@ Make sure you set `return_tensors="tf"` to return `tf.Tensor` outputs instead of ...@@ -150,6 +155,7 @@ Make sure you set `return_tensors="tf"` to return `tf.Tensor` outputs instead of
```python ```python
from transformers import DataCollatorWithPadding from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer, return_tensors="tf") data_collator = DataCollatorWithPadding(tokenizer, return_tensors="tf")
``` ```
...@@ -158,14 +164,14 @@ Next, convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset` ...@@ -158,14 +164,14 @@ Next, convert your datasets to the `tf.data.Dataset` format with `to_tf_dataset`
```python ```python
tf_train_dataset = tokenized_imdb["train"].to_tf_dataset( tf_train_dataset = tokenized_imdb["train"].to_tf_dataset(
columns=['attention_mask', 'input_ids', 'label'], columns=["attention_mask", "input_ids", "label"],
shuffle=True, shuffle=True,
batch_size=16, batch_size=16,
collate_fn=data_collator, collate_fn=data_collator,
) )
tf_validation_dataset = tokenized_imdb["train"].to_tf_dataset( tf_validation_dataset = tokenized_imdb["train"].to_tf_dataset(
columns=['attention_mask', 'input_ids', 'label'], columns=["attention_mask", "input_ids", "label"],
shuffle=False, shuffle=False,
batch_size=16, batch_size=16,
collate_fn=data_collator, collate_fn=data_collator,
...@@ -182,17 +188,14 @@ batch_size = 16 ...@@ -182,17 +188,14 @@ batch_size = 16
num_epochs = 5 num_epochs = 5
batches_per_epoch = len(tokenized_imdb["train"]) // batch_size batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
total_train_steps = int(batches_per_epoch * num_epochs) total_train_steps = int(batches_per_epoch * num_epochs)
optimizer, schedule = create_optimizer( optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
init_lr=2e-5,
num_warmup_steps=0,
num_train_steps=total_train_steps
)
``` ```
Load your model with the [`TFAutoModelForSequenceClassification`] class along with the number of expected labels: Load your model with the [`TFAutoModelForSequenceClassification`] class along with the number of expected labels:
```python ```python
from transformers import TFAutoModelForSequenceClassification from transformers import TFAutoModelForSequenceClassification
model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2) model = TFAutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)
``` ```
...@@ -200,6 +203,7 @@ Compile the model: ...@@ -200,6 +203,7 @@ Compile the model:
```python ```python
import tensorflow as tf import tensorflow as tf
model.compile(optimizer=optimizer) model.compile(optimizer=optimizer)
``` ```
...@@ -234,14 +238,15 @@ or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/no ...@@ -234,14 +238,15 @@ or [TensorFlow notebook](https://colab.research.google.com/github/huggingface/no
Load the WNUT 17 dataset from the 🤗 Datasets library: Load the WNUT 17 dataset from the 🤗 Datasets library:
```python ```python
from datasets import load_dataset >>> from datasets import load_dataset
wnut = load_dataset("wnut_17")
>>> wnut = load_dataset("wnut_17")
``` ```
A quick look at the dataset shows the labels associated with each word in the sentence: A quick look at the dataset shows the labels associated with each word in the sentence:
```python ```python
wnut["train"][0] >>> wnut["train"][0]
{'id': '0', {'id': '0',
'ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0], 'ner_tags': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 8, 8, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0],
'tokens': ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.'] 'tokens': ['@paulwalk', 'It', "'s", 'the', 'view', 'from', 'where', 'I', "'m", 'living', 'for', 'two', 'weeks', '.', 'Empire', 'State', 'Building', '=', 'ESB', '.', 'Pretty', 'bad', 'storm', 'here', 'last', 'evening', '.']
...@@ -251,21 +256,22 @@ wnut["train"][0] ...@@ -251,21 +256,22 @@ wnut["train"][0]
View the specific NER tags by: View the specific NER tags by:
```python ```python
label_list = wnut["train"].features[f"ner_tags"].feature.names >>> label_list = wnut["train"].features[f"ner_tags"].feature.names
label_list >>> label_list
['O', [
'B-corporation', "O",
'I-corporation', "B-corporation",
'B-creative-work', "I-corporation",
'I-creative-work', "B-creative-work",
'B-group', "I-creative-work",
'I-group', "B-group",
'B-location', "I-group",
'I-location', "B-location",
'B-person', "I-location",
'I-person', "B-person",
'B-product', "I-person",
'I-product' "B-product",
"I-product",
] ]
``` ```
...@@ -282,6 +288,7 @@ Now you need to tokenize the text. Load the DistilBERT tokenizer with an [`AutoT ...@@ -282,6 +288,7 @@ Now you need to tokenize the text. Load the DistilBERT tokenizer with an [`AutoT
```python ```python
from transformers import AutoTokenizer from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
``` ```
...@@ -289,9 +296,9 @@ Since the input has already been split into words, set `is_split_into_words=True ...@@ -289,9 +296,9 @@ Since the input has already been split into words, set `is_split_into_words=True
subwords: subwords:
```python ```python
tokenized_input = tokenizer(example["tokens"], is_split_into_words=True) >>> tokenized_input = tokenizer(example["tokens"], is_split_into_words=True)
tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"]) >>> tokens = tokenizer.convert_ids_to_tokens(tokenized_input["input_ids"])
tokens >>> tokens
['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]'] ['[CLS]', '@', 'paul', '##walk', 'it', "'", 's', 'the', 'view', 'from', 'where', 'i', "'", 'm', 'living', 'for', 'two', 'weeks', '.', 'empire', 'state', 'building', '=', 'es', '##b', '.', 'pretty', 'bad', 'storm', 'here', 'last', 'evening', '.', '[SEP]']
``` ```
...@@ -314,10 +321,10 @@ def tokenize_and_align_labels(examples): ...@@ -314,10 +321,10 @@ def tokenize_and_align_labels(examples):
word_ids = tokenized_inputs.word_ids(batch_index=i) # Map tokens to their respective word. word_ids = tokenized_inputs.word_ids(batch_index=i) # Map tokens to their respective word.
previous_word_idx = None previous_word_idx = None
label_ids = [] label_ids = []
for word_idx in word_ids: # Set the special tokens to -100. for word_idx in word_ids: # Set the special tokens to -100.
if word_idx is None: if word_idx is None:
label_ids.append(-100) label_ids.append(-100)
elif word_idx != previous_word_idx: # Only label the first token of a given word. elif word_idx != previous_word_idx: # Only label the first token of a given word.
label_ids.append(label[word_idx]) label_ids.append(label[word_idx])
labels.append(label_ids) labels.append(label_ids)
...@@ -336,6 +343,7 @@ Finally, pad your text and labels, so they are a uniform length: ...@@ -336,6 +343,7 @@ Finally, pad your text and labels, so they are a uniform length:
```python ```python
from transformers import DataCollatorForTokenClassification from transformers import DataCollatorForTokenClassification
data_collator = DataCollatorForTokenClassification(tokenizer) data_collator = DataCollatorForTokenClassification(tokenizer)
``` ```
...@@ -345,6 +353,7 @@ Load your model with the [`AutoModelForTokenClassification`] class along with th ...@@ -345,6 +353,7 @@ Load your model with the [`AutoModelForTokenClassification`] class along with th
```python ```python
from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer from transformers import AutoModelForTokenClassification, TrainingArguments, Trainer
model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list)) model = AutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
``` ```
...@@ -352,7 +361,7 @@ Gather your training arguments in [`TrainingArguments`]: ...@@ -352,7 +361,7 @@ Gather your training arguments in [`TrainingArguments`]:
```python ```python
training_args = TrainingArguments( training_args = TrainingArguments(
output_dir='./results', output_dir="./results",
evaluation_strategy="epoch", evaluation_strategy="epoch",
learning_rate=2e-5, learning_rate=2e-5,
per_device_train_batch_size=16, per_device_train_batch_size=16,
...@@ -387,6 +396,7 @@ Batch your examples together and pad your text and labels, so they are a uniform ...@@ -387,6 +396,7 @@ Batch your examples together and pad your text and labels, so they are a uniform
```python ```python
from transformers import DataCollatorForTokenClassification from transformers import DataCollatorForTokenClassification
data_collator = DataCollatorForTokenClassification(tokenizer, return_tensors="tf") data_collator = DataCollatorForTokenClassification(tokenizer, return_tensors="tf")
``` ```
...@@ -412,6 +422,7 @@ Load the model with the [`TFAutoModelForTokenClassification`] class along with t ...@@ -412,6 +422,7 @@ Load the model with the [`TFAutoModelForTokenClassification`] class along with t
```python ```python
from transformers import TFAutoModelForTokenClassification from transformers import TFAutoModelForTokenClassification
model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list)) model = TFAutoModelForTokenClassification.from_pretrained("distilbert-base-uncased", num_labels=len(label_list))
``` ```
...@@ -435,6 +446,7 @@ Compile the model: ...@@ -435,6 +446,7 @@ Compile the model:
```python ```python
import tensorflow as tf import tensorflow as tf
model.compile(optimizer=optimizer) model.compile(optimizer=optimizer)
``` ```
...@@ -469,13 +481,14 @@ Load the SQuAD dataset from the 🤗 Datasets library: ...@@ -469,13 +481,14 @@ Load the SQuAD dataset from the 🤗 Datasets library:
```python ```python
from datasets import load_dataset from datasets import load_dataset
squad = load_dataset("squad") squad = load_dataset("squad")
``` ```
Take a look at an example from the dataset: Take a look at an example from the dataset:
```python ```python
squad["train"][0] >>> squad["train"][0]
{'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']}, {'answers': {'answer_start': [515], 'text': ['Saint Bernadette Soubirous']},
'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.', 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
'id': '5733be284776f41900661182', 'id': '5733be284776f41900661182',
...@@ -490,6 +503,7 @@ Load the DistilBERT tokenizer with an [`AutoTokenizer`]: ...@@ -490,6 +503,7 @@ Load the DistilBERT tokenizer with an [`AutoTokenizer`]:
```python ```python
from transformers import AutoTokenizer from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
``` ```
...@@ -567,6 +581,7 @@ Batch the processed examples together: ...@@ -567,6 +581,7 @@ Batch the processed examples together:
```python ```python
from transformers import default_data_collator from transformers import default_data_collator
data_collator = default_data_collator data_collator = default_data_collator
``` ```
...@@ -576,6 +591,7 @@ Load your model with the [`AutoModelForQuestionAnswering`] class: ...@@ -576,6 +591,7 @@ Load your model with the [`AutoModelForQuestionAnswering`] class:
```python ```python
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased") model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
``` ```
...@@ -583,7 +599,7 @@ Gather your training arguments in [`TrainingArguments`]: ...@@ -583,7 +599,7 @@ Gather your training arguments in [`TrainingArguments`]:
```python ```python
training_args = TrainingArguments( training_args = TrainingArguments(
output_dir='./results', output_dir="./results",
evaluation_strategy="epoch", evaluation_strategy="epoch",
learning_rate=2e-5, learning_rate=2e-5,
per_device_train_batch_size=16, per_device_train_batch_size=16,
...@@ -618,6 +634,7 @@ Batch the processed examples together with a TensorFlow default data collator: ...@@ -618,6 +634,7 @@ Batch the processed examples together with a TensorFlow default data collator:
```python ```python
from transformers.data.data_collator import tf_default_collator from transformers.data.data_collator import tf_default_collator
data_collator = tf_default_collator data_collator = tf_default_collator
``` ```
...@@ -650,8 +667,8 @@ batch_size = 16 ...@@ -650,8 +667,8 @@ batch_size = 16
num_epochs = 2 num_epochs = 2
total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs
optimizer, schedule = create_optimizer( optimizer, schedule = create_optimizer(
init_lr=2e-5, init_lr=2e-5,
num_warmup_steps=0, num_warmup_steps=0,
num_train_steps=total_train_steps, num_train_steps=total_train_steps,
) )
``` ```
...@@ -660,6 +677,7 @@ Load your model with the [`TFAutoModelForQuestionAnswering`] class: ...@@ -660,6 +677,7 @@ Load your model with the [`TFAutoModelForQuestionAnswering`] class:
```python ```python
from transformers import TFAutoModelForQuestionAnswering from transformers import TFAutoModelForQuestionAnswering
model = TFAutoModelForQuestionAnswering("distilbert-base-uncased") model = TFAutoModelForQuestionAnswering("distilbert-base-uncased")
``` ```
...@@ -667,6 +685,7 @@ Compile the model: ...@@ -667,6 +685,7 @@ Compile the model:
```python ```python
import tensorflow as tf import tensorflow as tf
model.compile(optimizer=optimizer) model.compile(optimizer=optimizer)
``` ```
......
...@@ -49,6 +49,7 @@ If you're using your own training loop or another Trainer you can accomplish the ...@@ -49,6 +49,7 @@ If you're using your own training loop or another Trainer you can accomplish the
```python ```python
from .debug_utils import DebugUnderflowOverflow from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model) debug_overflow = DebugUnderflowOverflow(model)
``` ```
...@@ -200,13 +201,16 @@ def _forward(self, hidden_states): ...@@ -200,13 +201,16 @@ def _forward(self, hidden_states):
hidden_states = self.wo(hidden_states) hidden_states = self.wo(hidden_states)
return hidden_states return hidden_states
import torch import torch
def forward(self, hidden_states): def forward(self, hidden_states):
if torch.is_autocast_enabled(): if torch.is_autocast_enabled():
with torch.cuda.amp.autocast(enabled=False): with torch.cuda.amp.autocast(enabled=False):
return self._forward(hidden_states) return self._forward(hidden_states)
else: else:
return self._forward(hidden_states) return self._forward(hidden_states)
``` ```
Since the automatic detector only reports on inputs and outputs of full frames, once you know where to look, you may Since the automatic detector only reports on inputs and outputs of full frames, once you know where to look, you may
...@@ -216,8 +220,10 @@ want to analyse the intermediary stages of any specific `forward` function as we ...@@ -216,8 +220,10 @@ want to analyse the intermediary stages of any specific `forward` function as we
```python ```python
from debug_utils import detect_overflow from debug_utils import detect_overflow
class T5LayerFF(nn.Module): class T5LayerFF(nn.Module):
[...] [...]
def forward(self, hidden_states): def forward(self, hidden_states):
forwarded_states = self.layer_norm(hidden_states) forwarded_states = self.layer_norm(hidden_states)
detect_overflow(forwarded_states, "after layer_norm") detect_overflow(forwarded_states, "after layer_norm")
...@@ -237,6 +243,7 @@ its default, e.g.: ...@@ -237,6 +243,7 @@ its default, e.g.:
```python ```python
from .debug_utils import DebugUnderflowOverflow from .debug_utils import DebugUnderflowOverflow
debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100) debug_overflow = DebugUnderflowOverflow(model, max_frames_to_save=100)
``` ```
...@@ -248,7 +255,7 @@ Let's say you want to watch the absolute min and max values for all the ingredie ...@@ -248,7 +255,7 @@ Let's say you want to watch the absolute min and max values for all the ingredie
batch, and only do that for batches 1 and 3. Then you instantiate this class as: batch, and only do that for batches 1 and 3. Then you instantiate this class as:
```python ```python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3]) debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1, 3])
``` ```
And now full batches 1 and 3 will be traced using the same format as the underflow/overflow detector does. And now full batches 1 and 3 will be traced using the same format as the underflow/overflow detector does.
...@@ -295,5 +302,5 @@ numbers started to diverge. ...@@ -295,5 +302,5 @@ numbers started to diverge.
You can also specify the batch number after which to stop the training, with: You can also specify the batch number after which to stop the training, with:
```python ```python
debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1,3], abort_after_batch_num=3) debug_overflow = DebugUnderflowOverflow(model, trace_batch_nums=[1, 3], abort_after_batch_num=3)
``` ```
...@@ -58,6 +58,7 @@ tokenizer, which is a [WordPiece](https://arxiv.org/pdf/1609.08144.pdf) tokenize ...@@ -58,6 +58,7 @@ tokenizer, which is a [WordPiece](https://arxiv.org/pdf/1609.08144.pdf) tokenize
```python ```python
>>> from transformers import BertTokenizer >>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased") >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence = "A Titan RTX has 24GB of VRAM" >>> sequence = "A Titan RTX has 24GB of VRAM"
...@@ -126,6 +127,7 @@ For example, consider these two sequences: ...@@ -126,6 +127,7 @@ For example, consider these two sequences:
```python ```python
>>> from transformers import BertTokenizer >>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased") >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "This is a short sequence." >>> sequence_a = "This is a short sequence."
...@@ -190,6 +192,7 @@ arguments (and not a list, like before) like this: ...@@ -190,6 +192,7 @@ arguments (and not a list, like before) like this:
```python ```python
>>> from transformers import BertTokenizer >>> from transformers import BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased") >>> tokenizer = BertTokenizer.from_pretrained("bert-base-cased")
>>> sequence_a = "HuggingFace is based in NYC" >>> sequence_a = "HuggingFace is based in NYC"
>>> sequence_b = "Where is HuggingFace based?" >>> sequence_b = "Where is HuggingFace based?"
...@@ -212,7 +215,7 @@ the two types of sequence in the model. ...@@ -212,7 +215,7 @@ the two types of sequence in the model.
The tokenizer returns this mask as the "token_type_ids" entry: The tokenizer returns this mask as the "token_type_ids" entry:
```python ```python
>>> encoded_dict['token_type_ids'] >>> encoded_dict["token_type_ids"]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1] [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]
``` ```
......
...@@ -32,8 +32,8 @@ Here's an example: ...@@ -32,8 +32,8 @@ Here's an example:
```python ```python
from transformers import GPT2Tokenizer, GPT2LMHeadModel from transformers import GPT2Tokenizer, GPT2LMHeadModel
tokenizer = GPT2Tokenizer.from_pretrained('gpt2') tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
model = GPT2LMHeadModel.from_pretrained('gpt2') model = GPT2LMHeadModel.from_pretrained("gpt2")
inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt") inputs = tokenizer("Hello, my dog is cute and ", return_tensors="pt")
generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True) generation_output = model.generate(**inputs, return_dict_in_generate=True, output_scores=True)
......
...@@ -79,12 +79,13 @@ class MyCallback(TrainerCallback): ...@@ -79,12 +79,13 @@ class MyCallback(TrainerCallback):
def on_train_begin(self, args, state, control, **kwargs): def on_train_begin(self, args, state, control, **kwargs):
print("Starting training") print("Starting training")
trainer = Trainer( trainer = Trainer(
model, model,
args, args,
train_dataset=train_dataset, train_dataset=train_dataset,
eval_dataset=eval_dataset, eval_dataset=eval_dataset,
callbacks=[MyCallback] # We can either pass the callback class this way or an instance of it (MyCallback()) callbacks=[MyCallback], # We can either pass the callback class this way or an instance of it (MyCallback())
) )
``` ```
......
...@@ -295,11 +295,12 @@ If you're using only 1 GPU, here is how you'd have to adjust your training code ...@@ -295,11 +295,12 @@ If you're using only 1 GPU, here is how you'd have to adjust your training code
# DeepSpeed requires a distributed environment even when only one process is used. # DeepSpeed requires a distributed environment even when only one process is used.
# This emulates a launcher in the notebook # This emulates a launcher in the notebook
import os import os
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '9994' # modify if RuntimeError: Address already in use os.environ["MASTER_ADDR"] = "localhost"
os.environ['RANK'] = "0" os.environ["MASTER_PORT"] = "9994" # modify if RuntimeError: Address already in use
os.environ['LOCAL_RANK'] = "0" os.environ["RANK"] = "0"
os.environ['WORLD_SIZE'] = "1" os.environ["LOCAL_RANK"] = "0"
os.environ["WORLD_SIZE"] = "1"
# Now proceed as normal, plus pass the deepspeed config file # Now proceed as normal, plus pass the deepspeed config file
training_args = TrainingArguments(..., deepspeed="ds_config_zero3.json") training_args = TrainingArguments(..., deepspeed="ds_config_zero3.json")
...@@ -316,7 +317,7 @@ at the beginning of this section. ...@@ -316,7 +317,7 @@ at the beginning of this section.
If you want to create the config file on the fly in the notebook in the current directory, you could have a dedicated If you want to create the config file on the fly in the notebook in the current directory, you could have a dedicated
cell with: cell with:
```python ```python no-style
%%bash %%bash
cat <<'EOT' > ds_config_zero3.json cat <<'EOT' > ds_config_zero3.json
{ {
...@@ -382,14 +383,14 @@ EOT ...@@ -382,14 +383,14 @@ EOT
If the training script is in a normal file and not in the notebook cells, you can launch `deepspeed` normally via If the training script is in a normal file and not in the notebook cells, you can launch `deepspeed` normally via
shell from a cell. For example, to use `run_translation.py` you would launch it with: shell from a cell. For example, to use `run_translation.py` you would launch it with:
```python ```python no-style
!git clone https://github.com/huggingface/transformers !git clone https://github.com/huggingface/transformers
!cd transformers; deepspeed examples/pytorch/translation/run_translation.py ... !cd transformers; deepspeed examples/pytorch/translation/run_translation.py ...
``` ```
or with `%%bash` magic, where you can write a multi-line code for the shell program to run: or with `%%bash` magic, where you can write a multi-line code for the shell program to run:
```python ```python no-style
%%bash %%bash
git clone https://github.com/huggingface/transformers git clone https://github.com/huggingface/transformers
...@@ -512,7 +513,7 @@ TrainingArguments(..., deepspeed="/path/to/ds_config.json") ...@@ -512,7 +513,7 @@ TrainingArguments(..., deepspeed="/path/to/ds_config.json")
or: or:
```python ```python
ds_config_dict=dict(scheduler=scheduler_params, optimizer=optimizer_params) ds_config_dict = dict(scheduler=scheduler_params, optimizer=optimizer_params)
TrainingArguments(..., deepspeed=ds_config_dict) TrainingArguments(..., deepspeed=ds_config_dict)
``` ```
...@@ -1430,6 +1431,7 @@ If you have saved at least one checkpoint, and you want to use the latest one, y ...@@ -1430,6 +1431,7 @@ If you have saved at least one checkpoint, and you want to use the latest one, y
```python ```python
from transformers.trainer_utils import get_last_checkpoint from transformers.trainer_utils import get_last_checkpoint
from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint
checkpoint_dir = get_last_checkpoint(trainer.args.output_dir) checkpoint_dir = get_last_checkpoint(trainer.args.output_dir)
fp32_model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir) fp32_model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
``` ```
...@@ -1439,6 +1441,7 @@ checkpoint), then you can finish the training by first saving the final model ex ...@@ -1439,6 +1441,7 @@ checkpoint), then you can finish the training by first saving the final model ex
```python ```python
from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint from deepspeed.utils.zero_to_fp32 import load_state_dict_from_zero_checkpoint
checkpoint_dir = os.path.join(trainer.args.output_dir, "checkpoint-final") checkpoint_dir = os.path.join(trainer.args.output_dir, "checkpoint-final")
trainer.deepspeed.save_checkpoint(checkpoint_dir) trainer.deepspeed.save_checkpoint(checkpoint_dir)
fp32_model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir) fp32_model = load_state_dict_from_zero_checkpoint(trainer.model, checkpoint_dir)
...@@ -1461,7 +1464,8 @@ these yourself as is shown in the following example: ...@@ -1461,7 +1464,8 @@ these yourself as is shown in the following example:
```python ```python
from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint
state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir) # already on cpu
state_dict = get_fp32_state_dict_from_zero_checkpoint(checkpoint_dir) # already on cpu
model = model.cpu() model = model.cpu()
model.load_state_dict(state_dict) model.load_state_dict(state_dict)
``` ```
...@@ -1529,9 +1533,10 @@ context manager (which is also a function decorator), like so: ...@@ -1529,9 +1533,10 @@ context manager (which is also a function decorator), like so:
```python ```python
from transformers import T5ForConditionalGeneration, T5Config from transformers import T5ForConditionalGeneration, T5Config
import deepspeed import deepspeed
with deepspeed.zero.Init(): with deepspeed.zero.Init():
config = T5Config.from_pretrained("t5-small") config = T5Config.from_pretrained("t5-small")
model = T5ForConditionalGeneration(config) model = T5ForConditionalGeneration(config)
``` ```
As you can see this gives you a randomly initialized model. As you can see this gives you a randomly initialized model.
...@@ -1544,6 +1549,7 @@ section. Thus you must create the [`TrainingArguments`] object **before** callin ...@@ -1544,6 +1549,7 @@ section. Thus you must create the [`TrainingArguments`] object **before** callin
```python ```python
from transformers import AutoModel, Trainer, TrainingArguments from transformers import AutoModel, Trainer, TrainingArguments
training_args = TrainingArguments(..., deepspeed=ds_config) training_args = TrainingArguments(..., deepspeed=ds_config)
model = AutoModel.from_pretrained("t5-small") model = AutoModel.from_pretrained("t5-small")
trainer = Trainer(model=model, args=training_args, ...) trainer = Trainer(model=model, args=training_args, ...)
...@@ -1574,7 +1580,7 @@ limitations. ...@@ -1574,7 +1580,7 @@ limitations.
Also under ZeRO-3, if you write your own code and run into a model parameter weight that looks like: Also under ZeRO-3, if you write your own code and run into a model parameter weight that looks like:
```python ```python
tensor([1.], device='cuda:0', dtype=torch.float16, requires_grad=True) tensor([1.0], device="cuda:0", dtype=torch.float16, requires_grad=True)
``` ```
stress on `tensor([1.])`, or if you get an error where it says the parameter is of size `1`, instead of some much stress on `tensor([1.])`, or if you get an error where it says the parameter is of size `1`, instead of some much
...@@ -1715,9 +1721,9 @@ For example for a pretrained model: ...@@ -1715,9 +1721,9 @@ For example for a pretrained model:
from transformers.deepspeed import HfDeepSpeedConfig from transformers.deepspeed import HfDeepSpeedConfig
from transformers import AutoModel, deepspeed from transformers import AutoModel, deepspeed
ds_config = { ... } # deepspeed config object or path to the file ds_config = {...} # deepspeed config object or path to the file
# must run before instantiating the model # must run before instantiating the model
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
model = AutoModel.from_pretrained("gpt2") model = AutoModel.from_pretrained("gpt2")
engine = deepspeed.initialize(model=model, config_params=ds_config, ...) engine = deepspeed.initialize(model=model, config_params=ds_config, ...)
``` ```
...@@ -1728,9 +1734,9 @@ or for non-pretrained model: ...@@ -1728,9 +1734,9 @@ or for non-pretrained model:
from transformers.deepspeed import HfDeepSpeedConfig from transformers.deepspeed import HfDeepSpeedConfig
from transformers import AutoModel, AutoConfig, deepspeed from transformers import AutoModel, AutoConfig, deepspeed
ds_config = { ... } # deepspeed config object or path to the file ds_config = {...} # deepspeed config object or path to the file
# must run before instantiating the model # must run before instantiating the model
dschf = HfDeepSpeedConfig(ds_config) # keep this object alive dschf = HfDeepSpeedConfig(ds_config) # keep this object alive
config = AutoConfig.from_pretrained("gpt2") config = AutoConfig.from_pretrained("gpt2")
model = AutoModel.from_config(config) model = AutoModel.from_config(config)
engine = deepspeed.initialize(model=model, config_params=ds_config, ...) engine = deepspeed.initialize(model=model, config_params=ds_config, ...)
......
...@@ -21,6 +21,7 @@ to the INFO level. ...@@ -21,6 +21,7 @@ to the INFO level.
```python ```python
import transformers import transformers
transformers.logging.set_verbosity_info() transformers.logging.set_verbosity_info()
``` ```
......
...@@ -22,8 +22,8 @@ Let's see of this looks on an example: ...@@ -22,8 +22,8 @@ Let's see of this looks on an example:
from transformers import BertTokenizer, BertForSequenceClassification from transformers import BertTokenizer, BertForSequenceClassification
import torch import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertForSequenceClassification.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
labels = torch.tensor([1]).unsqueeze(0) # Batch size 1 labels = torch.tensor([1]).unsqueeze(0) # Batch size 1
......
...@@ -101,6 +101,7 @@ from transformers import pipeline ...@@ -101,6 +101,7 @@ from transformers import pipeline
pipe = pipeline("text-classification") pipe = pipeline("text-classification")
def data(): def data():
while True: while True:
# This could come from a dataset, a database, a queue or HTTP request # This could come from a dataset, a database, a queue or HTTP request
...@@ -110,6 +111,7 @@ def data(): ...@@ -110,6 +111,7 @@ def data():
# does the preprocessing while the main runs the big inference # does the preprocessing while the main runs the big inference
yield "This is a test" yield "This is a test"
for out in pipe(data()): for out in pipe(data()):
print(out) print(out)
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
...@@ -125,10 +127,10 @@ All pipelines can use batching. This will work ...@@ -125,10 +127,10 @@ All pipelines can use batching. This will work
whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`). whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`).
```python ```python
from transformers import pipeline from transformers import pipeline
from transformers.pipelines.base import KeyDataset from transformers.pipelines.base import KeyDataset
import datasets import datasets
import tqdm import tqdm
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised") dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
pipe = pipeline("text-classification", device=0) pipe = pipeline("text-classification", device=0)
...@@ -149,28 +151,28 @@ Example where it's mostly a speedup: ...@@ -149,28 +151,28 @@ Example where it's mostly a speedup:
</Tip> </Tip>
```python ```python
from transformers import pipeline from transformers import pipeline
from torch.utils.data import Dataset from torch.utils.data import Dataset
import tqdm import tqdm
pipe = pipeline("text-classification", device=0) pipe = pipeline("text-classification", device=0)
class MyDataset(Dataset): class MyDataset(Dataset):
def __len__(self): def __len__(self):
return 5000 return 5000
def __getitem__(self, i): def __getitem__(self, i):
return "This is a test" return "This is a test"
dataset = MyDataset() dataset = MyDataset()
for batch_size in [1, 8, 64, 256]: for batch_size in [1, 8, 64, 256]:
print("-" * 30) print("-" * 30)
print(f"Streaming batch_size={batch_size}") print(f"Streaming batch_size={batch_size}")
for out in tqdm.tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)): for out in tqdm.tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
pass pass
``` ```
...@@ -194,15 +196,15 @@ Streaming batch_size=256 ...@@ -194,15 +196,15 @@ Streaming batch_size=256
Example where it's most a slowdown: Example where it's most a slowdown:
```python ```python
class MyDataset(Dataset): class MyDataset(Dataset):
def __len__(self): def __len__(self):
return 5000 return 5000
def __getitem__(self, i): def __getitem__(self, i):
if i % 64 == 0: if i % 64 == 0:
n = 100 n = 100
else: else:
n = 1 n = 1
return "This is a test" * n return "This is a test" * n
``` ```
...@@ -298,10 +300,11 @@ If you want to try simply you can: ...@@ -298,10 +300,11 @@ If you want to try simply you can:
```python ```python
class MyPipeline(TextClassificationPipeline): class MyPipeline(TextClassificationPipeline):
def postprocess(...): def postprocess():
... # Your code goes here
scores = scores * 100 scores = scores * 100
... # And here
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...) my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
# or if you use *pipeline* function, then: # or if you use *pipeline* function, then:
......
...@@ -122,7 +122,7 @@ examples = processor.get_dev_examples(squad_v2_data_dir) ...@@ -122,7 +122,7 @@ examples = processor.get_dev_examples(squad_v2_data_dir)
processor = SquadV1Processor() processor = SquadV1Processor()
examples = processor.get_dev_examples(squad_v1_data_dir) examples = processor.get_dev_examples(squad_v1_data_dir)
features = squad_convert_examples_to_features( features = squad_convert_examples_to_features(
examples=examples, examples=examples,
tokenizer=tokenizer, tokenizer=tokenizer,
max_seq_length=max_seq_length, max_seq_length=max_seq_length,
...@@ -139,7 +139,7 @@ Using *tensorflow_datasets* is as easy as using a data file: ...@@ -139,7 +139,7 @@ Using *tensorflow_datasets* is as easy as using a data file:
tfds_examples = tfds.load("squad") tfds_examples = tfds.load("squad")
examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate) examples = SquadV1Processor().get_examples_from_dataset(tfds_examples, evaluate=evaluate)
features = squad_convert_examples_to_features( features = squad_convert_examples_to_features(
examples=examples, examples=examples,
tokenizer=tokenizer, tokenizer=tokenizer,
max_seq_length=max_seq_length, max_seq_length=max_seq_length,
......
...@@ -53,14 +53,16 @@ Here is an example of how to customize [`Trainer`] using a custom loss function ...@@ -53,14 +53,16 @@ Here is an example of how to customize [`Trainer`] using a custom loss function
from torch import nn from torch import nn
from transformers import Trainer from transformers import Trainer
class MultilabelTrainer(Trainer): class MultilabelTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False): def compute_loss(self, model, inputs, return_outputs=False):
labels = inputs.get("labels") labels = inputs.get("labels")
outputs = model(**inputs) outputs = model(**inputs)
logits = outputs.get('logits') logits = outputs.get("logits")
loss_fct = nn.BCEWithLogitsLoss() loss_fct = nn.BCEWithLogitsLoss()
loss = loss_fct(logits.view(-1, self.model.config.num_labels), loss = loss_fct(
labels.float().view(-1, self.model.config.num_labels)) logits.view(-1, self.model.config.num_labels), labels.float().view(-1, self.model.config.num_labels)
)
return (loss, outputs) if return_outputs else loss return (loss, outputs) if return_outputs else loss
``` ```
......
...@@ -209,7 +209,7 @@ Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for ...@@ -209,7 +209,7 @@ Here is a `pytorch-pretrained-bert` to 🤗 Transformers conversion example for
```python ```python
# Let's load our model # Let's load our model
model = BertForSequenceClassification.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
# If you used to have this line in pytorch-pretrained-bert: # If you used to have this line in pytorch-pretrained-bert:
loss = model(input_ids, labels=labels) loss = model(input_ids, labels=labels)
...@@ -222,7 +222,7 @@ loss = outputs[0] ...@@ -222,7 +222,7 @@ loss = outputs[0]
loss, logits = outputs[:2] loss, logits = outputs[:2]
# And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation) # And even the attention weights if you configure the model to output them (and other outputs too, see the docstrings and documentation)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', output_attentions=True) model = BertForSequenceClassification.from_pretrained("bert-base-uncased", output_attentions=True)
outputs = model(input_ids, labels=labels) outputs = model(input_ids, labels=labels)
loss, logits, attentions = outputs loss, logits, attentions = outputs
``` ```
...@@ -241,23 +241,23 @@ Here is an example: ...@@ -241,23 +241,23 @@ Here is an example:
```python ```python
### Let's load a model and tokenizer ### Let's load a model and tokenizer
model = BertForSequenceClassification.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
### Do some stuff to our model and tokenizer ### Do some stuff to our model and tokenizer
# Ex: add new tokens to the vocabulary and embeddings of our model # Ex: add new tokens to the vocabulary and embeddings of our model
tokenizer.add_tokens(['[SPECIAL_TOKEN_1]', '[SPECIAL_TOKEN_2]']) tokenizer.add_tokens(["[SPECIAL_TOKEN_1]", "[SPECIAL_TOKEN_2]"])
model.resize_token_embeddings(len(tokenizer)) model.resize_token_embeddings(len(tokenizer))
# Train our model # Train our model
train(model) train(model)
### Now let's save our model and tokenizer to a directory ### Now let's save our model and tokenizer to a directory
model.save_pretrained('./my_saved_model_directory/') model.save_pretrained("./my_saved_model_directory/")
tokenizer.save_pretrained('./my_saved_model_directory/') tokenizer.save_pretrained("./my_saved_model_directory/")
### Reload the model and the tokenizer ### Reload the model and the tokenizer
model = BertForSequenceClassification.from_pretrained('./my_saved_model_directory/') model = BertForSequenceClassification.from_pretrained("./my_saved_model_directory/")
tokenizer = BertTokenizer.from_pretrained('./my_saved_model_directory/') tokenizer = BertTokenizer.from_pretrained("./my_saved_model_directory/")
``` ```
### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules ### Optimizers: BertAdam & OpenAIAdam are now AdamW, schedules are standard PyTorch schedules
...@@ -283,7 +283,13 @@ num_warmup_steps = 100 ...@@ -283,7 +283,13 @@ num_warmup_steps = 100
warmup_proportion = float(num_warmup_steps) / float(num_training_steps) # 0.1 warmup_proportion = float(num_warmup_steps) / float(num_training_steps) # 0.1
### Previously BertAdam optimizer was instantiated like this: ### Previously BertAdam optimizer was instantiated like this:
optimizer = BertAdam(model.parameters(), lr=lr, schedule='warmup_linear', warmup=warmup_proportion, num_training_steps=num_training_steps) optimizer = BertAdam(
model.parameters(),
lr=lr,
schedule="warmup_linear",
warmup=warmup_proportion,
num_training_steps=num_training_steps,
)
### and used like this: ### and used like this:
for batch in train_data: for batch in train_data:
loss = model(batch) loss = model(batch)
...@@ -291,13 +297,19 @@ for batch in train_data: ...@@ -291,13 +297,19 @@ for batch in train_data:
optimizer.step() optimizer.step()
### In 🤗 Transformers, optimizer and schedules are split and instantiated like this: ### In 🤗 Transformers, optimizer and schedules are split and instantiated like this:
optimizer = AdamW(model.parameters(), lr=lr, correct_bias=False) # To reproduce BertAdam specific behavior set correct_bias=False optimizer = AdamW(
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps) # PyTorch scheduler model.parameters(), lr=lr, correct_bias=False
) # To reproduce BertAdam specific behavior set correct_bias=False
scheduler = get_linear_schedule_with_warmup(
optimizer, num_warmup_steps=num_warmup_steps, num_training_steps=num_training_steps
) # PyTorch scheduler
### and used like this: ### and used like this:
for batch in train_data: for batch in train_data:
loss = model(batch) loss = model(batch)
loss.backward() loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), max_grad_norm) # Gradient clipping is not in AdamW anymore (so you can use amp without issue) torch.nn.utils.clip_grad_norm_(
model.parameters(), max_grad_norm
) # Gradient clipping is not in AdamW anymore (so you can use amp without issue)
optimizer.step() optimizer.step()
scheduler.step() scheduler.step()
``` ```
...@@ -64,12 +64,15 @@ The `facebook/bart-base` and `facebook/bart-large` checkpoints can be used to fi ...@@ -64,12 +64,15 @@ The `facebook/bart-base` and `facebook/bart-large` checkpoints can be used to fi
```python ```python
from transformers import BartForConditionalGeneration, BartTokenizer from transformers import BartForConditionalGeneration, BartTokenizer
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0) model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0)
tok = BartTokenizer.from_pretrained("facebook/bart-large") tok = BartTokenizer.from_pretrained("facebook/bart-large")
example_english_phrase = "UN Chief Says There Is No <mask> in Syria" example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
batch = tok(example_english_phrase, return_tensors='pt') batch = tok(example_english_phrase, return_tensors="pt")
generated_ids = model.generate(batch['input_ids']) generated_ids = model.generate(batch["input_ids"])
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == ['UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria'] assert tok.batch_decode(generated_ids, skip_special_tokens=True) == [
"UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria"
]
``` ```
## BartConfig ## BartConfig
......
...@@ -44,6 +44,7 @@ Example of use: ...@@ -44,6 +44,7 @@ Example of use:
>>> # With TensorFlow 2.0+: >>> # With TensorFlow 2.0+:
>>> from transformers import TFAutoModel >>> from transformers import TFAutoModel
>>> bartpho = TFAutoModel.from_pretrained("vinai/bartpho-syllable") >>> bartpho = TFAutoModel.from_pretrained("vinai/bartpho-syllable")
>>> input_ids = tokenizer(line, return_tensors="tf") >>> input_ids = tokenizer(line, return_tensors="tf")
>>> features = bartpho(**input_ids) >>> features = bartpho(**input_ids)
...@@ -58,9 +59,10 @@ Tips: ...@@ -58,9 +59,10 @@ Tips:
```python ```python
>>> from transformers import MBartForConditionalGeneration >>> from transformers import MBartForConditionalGeneration
>>> bartpho = MBartForConditionalGeneration.from_pretrained("vinai/bartpho-syllable") >>> bartpho = MBartForConditionalGeneration.from_pretrained("vinai/bartpho-syllable")
>>> TXT = 'Chúng tôi là <mask> nghiên cứu viên.' >>> TXT = "Chúng tôi là <mask> nghiên cứu viên."
>>> input_ids = tokenizer([TXT], return_tensors='pt')['input_ids'] >>> input_ids = tokenizer([TXT], return_tensors="pt")["input_ids"]
>>> logits = bartpho(input_ids).logits >>> logits = bartpho(input_ids).logits
>>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item() >>> masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
>>> probs = logits[0, masked_index].softmax(dim=0) >>> probs = logits[0, masked_index].softmax(dim=0)
......
...@@ -30,7 +30,7 @@ Example of using a model with MeCab and WordPiece tokenization: ...@@ -30,7 +30,7 @@ Example of using a model with MeCab and WordPiece tokenization:
```python ```python
>>> import torch >>> import torch
>>> from transformers import AutoModel, AutoTokenizer >>> from transformers import AutoModel, AutoTokenizer
>>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese") >>> bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese")
>>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese") >>> tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
...@@ -40,7 +40,7 @@ Example of using a model with MeCab and WordPiece tokenization: ...@@ -40,7 +40,7 @@ Example of using a model with MeCab and WordPiece tokenization:
>>> inputs = tokenizer(line, return_tensors="pt") >>> inputs = tokenizer(line, return_tensors="pt")
>>> print(tokenizer.decode(inputs['input_ids'][0])) >>> print(tokenizer.decode(inputs["input_ids"][0]))
[CLS] 吾輩 は 猫 で ある 。 [SEP] [CLS] 吾輩 は 猫 で ある 。 [SEP]
>>> outputs = bertjapanese(**inputs) >>> outputs = bertjapanese(**inputs)
...@@ -57,7 +57,7 @@ Example of using a model with Character tokenization: ...@@ -57,7 +57,7 @@ Example of using a model with Character tokenization:
>>> inputs = tokenizer(line, return_tensors="pt") >>> inputs = tokenizer(line, return_tensors="pt")
>>> print(tokenizer.decode(inputs['input_ids'][0])) >>> print(tokenizer.decode(inputs["input_ids"][0]))
[CLS] 吾 輩 は 猫 で あ る 。 [SEP] [CLS] 吾 輩 は 猫 で あ る 。 [SEP]
>>> outputs = bertjapanese(**inputs) >>> outputs = bertjapanese(**inputs)
......
...@@ -39,14 +39,18 @@ Usage: ...@@ -39,14 +39,18 @@ Usage:
>>> # use BERT's cls token as BOS token and sep token as EOS token >>> # use BERT's cls token as BOS token and sep token as EOS token
>>> encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102) >>> encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102)
>>> # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token >>> # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token
>>> decoder = BertGenerationDecoder.from_pretrained("bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102) >>> decoder = BertGenerationDecoder.from_pretrained(
... "bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102
... )
>>> bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder) >>> bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder)
>>> # create tokenizer... >>> # create tokenizer...
>>> tokenizer = BertTokenizer.from_pretrained("bert-large-uncased") >>> tokenizer = BertTokenizer.from_pretrained("bert-large-uncased")
>>> input_ids = tokenizer('This is a long article to summarize', add_special_tokens=False, return_tensors="pt").input_ids >>> input_ids = tokenizer(
>>> labels = tokenizer('This is a short summary', return_tensors="pt").input_ids ... "This is a long article to summarize", add_special_tokens=False, return_tensors="pt"
>>> ).input_ids
>>> labels = tokenizer("This is a short summary", return_tensors="pt").input_ids
>>> # train... >>> # train...
>>> loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss >>> loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss
...@@ -61,7 +65,9 @@ Usage: ...@@ -61,7 +65,9 @@ Usage:
>>> sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse") >>> sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse")
>>> tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse") >>> tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse")
>>> input_ids = tokenizer('This is the first sentence. This is the second sentence.', add_special_tokens=False, return_tensors="pt").input_ids >>> input_ids = tokenizer(
... "This is the first sentence. This is the second sentence.", add_special_tokens=False, return_tensors="pt"
>>> ).input_ids
>>> outputs = sentence_fuser.generate(input_ids) >>> outputs = sentence_fuser.generate(input_ids)
......
...@@ -28,14 +28,14 @@ Example of use: ...@@ -28,14 +28,14 @@ Example of use:
```python ```python
>>> import torch >>> import torch
>>> from transformers import AutoModel, AutoTokenizer >>> from transformers import AutoModel, AutoTokenizer
>>> bertweet = AutoModel.from_pretrained("vinai/bertweet-base") >>> bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
>>> # For transformers v4.x+: >>> # For transformers v4.x+:
>>> tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False) >>> tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
>>> # For transformers v3.x: >>> # For transformers v3.x:
>>> # tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base") >>> # tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
>>> # INPUT TWEET IS ALREADY NORMALIZED! >>> # INPUT TWEET IS ALREADY NORMALIZED!
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment