Adopt framework-specific blocks for content (#16342)

* ✨ refactor code samples with framework-specific blocks * ✨ update training.mdx * 🖍 apply feedback

Adopt framework-specific blocks for content (#16342)
* ✨ refactor code samples with framework-specific blocks * ✨ update training.mdx * 🖍 apply feedback
77321481 · Steven Liu · GitHub · 62cbd842 · 77321481 · 77321481
Unverified Commit 77321481 authored Mar 22, 2022 by Steven Liu Committed by GitHub Mar 22, 2022
13 changed files
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -22,7 +22,7 @@
  - local: model_summary
    title: Summary of the models
  - local: training
-    title: Fine-tuning a pretrained model
+    title: Fine-tune a pretrained model
  - local: accelerate
    title: Distributed training with 🤗 Accelerate
  - local: model_sharing

--- a/docs/source/model_sharing.mdx
+++ b/docs/source/model_sharing.mdx
@@ -75,25 +75,29 @@ To ensure your model can be used by someone working with a different framework,
 Converting a checkpoint for another framework is easy. Make sure you have PyTorch and TensorFlow installed (see [here](installation) for installation instructions), and then find the specific model for your task in the other framework. 
-For example, suppose you trained DistilBert for sequence classification in PyTorch and want to convert it to it's TensorFlow equivalent. Load the TensorFlow equivalent of your model for your task, and specify `from_pt=True` so 🤗 Transformers will convert the PyTorch checkpoint to a TensorFlow checkpoint:
+<frameworkcontent>
+<pt>
+Specify `from_tf=True` to convert a checkpoint from TensorFlow to PyTorch:
 ```py
->>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
+>>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
+>>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
 ```
+</pt>
-Then save your new TensorFlow model with it's new checkpoint:
+<tf>
+Specify `from_pt=True` to convert a checkpoint from PyTorch to TensorFlow:
 ```py
->>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
+>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
 ```
-Similarly, specify `from_tf=True` to convert a checkpoint from TensorFlow to PyTorch:
+Then you can save your new TensorFlow model with it's new checkpoint:
 ```py
->>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
+>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
->>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
 ```
+</tf>
+<jax>
 If a model is available in Flax, you can also convert a checkpoint from PyTorch to Flax:
 ```py
@@ -101,9 +105,13 @@ If a model is available in Flax, you can also convert a checkpoint from PyTorch
 ...     "path/to/awesome-name-you-picked", from_pt=True
 ... )
 ```
+</jax>
+</frameworkcontent>
-## Push a model with `Trainer`
+## Push a model during training
+<frameworkcontent>
+<pt>
 <Youtube id="Z1-XMy-GNLQ"/>
 Sharing a model to the Hub is as simple as adding an extra parameter or callback. Remember from the [fine-tuning tutorial](training), the [`TrainingArguments`] class is where you specify hyperparameters and additional training options. One of these training options includes the ability to push a model directly to the Hub. Set `push_to_hub=True` in your [`TrainingArguments`]:
@@ -129,10 +137,9 @@ After you fine-tune your model, call [`~transformers.Trainer.push_to_hub`] on [`
 ```py
 >>> trainer.push_to_hub()
 ```
+</pt>
-## Push a model with `PushToHubCallback`
+<tf>
+Share a model to the Hub with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:
-TensorFlow users can enable the same functionality with [`PushToHubCallback`]. In the [`PushToHubCallback`] function, add:
 - An output directory for your model.
 - A tokenizer.
@@ -151,6 +158,8 @@ Add the callback to [`fit`](https://keras.io/api/models/model_training_apis/), a
 ```py
 >>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
 ```
+</tf>
+</frameworkcontent>
 ## Use the `push_to_hub` function

--- a/docs/source/tasks/asr.mdx
+++ b/docs/source/tasks/asr.mdx
@@ -155,8 +155,10 @@ Create a batch of examples and dynamically pad them with `DataCollatorForCTCWith
 >>> data_collator = DataCollatorCTCWithPadding(processor=processor, padding=True)
 ```
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load Wav2Vec2 with [`AutoModelForCTC`]. For `ctc_loss_reduction`, it is often better to use the average instead of the default summation:
 ```py
@@ -206,6 +208,8 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/audio_classification.mdx
+++ b/docs/source/tasks/audio_classification.mdx
@@ -91,8 +91,10 @@ Use 🤗 Datasets [`map`](https://huggingface.co/docs/datasets/package_reference
 >>> encoded_ks = ks.map(preprocess_function, remove_columns=["audio", "file"], batched=True)
 ```
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load Wav2Vec2 with [`AutoModelForAudioClassification`]. Specify the number of labels, and pass the model the mapping between label number and label class:
 ```py
@@ -135,6 +137,8 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/image_classification.mdx
+++ b/docs/source/tasks/image_classification.mdx
@@ -109,8 +109,10 @@ Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data col
 >>> data_collator = DefaultDataCollator()
 ```
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load ViT with [`AutoModelForImageClassification`]. Specify the number of labels, and pass the model the mapping between label number and label class:
 ```py
@@ -162,6 +164,8 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/language_modeling.mdx
+++ b/docs/source/tasks/language_modeling.mdx
@@ -200,8 +200,10 @@ For masked language modeling, use the same [`DataCollatorForLanguageModeling`] e
 Causal language modeling is frequently used for text generation. This section shows you how to fine-tune [DistilGPT2](https://huggingface.co/distilgpt2) to generate new text.
-### Fine-tune with Trainer
+### Train
+<frameworkcontent>
+<pt>
 Load DistilGPT2 with [`AutoModelForCausalLM`]:
 ```py
@@ -240,18 +242,9 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
-### Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> tf_train_set = lm_dataset["train"].to_tf_dataset(
@@ -271,6 +264,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate, and some training hyperparameters:
 ```py
@@ -300,13 +299,17 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 ## Masked language modeling
 Masked language modeling is also known as a fill-mask task because it predicts a masked token in a sequence. Models for masked language modeling require a good contextual understanding of an entire sequence instead of only the left context. This section shows you how to fine-tune [DistilRoBERTa](https://huggingface.co/distilroberta-base) to predict a masked word.
-### Fine-tune with Trainer
+### Train
+<frameworkcontent>
+<pt>
 Load DistilRoBERTa with [`AutoModelForMaskedlM`]:
 ```py
@@ -346,18 +349,9 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
-### Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> tf_train_set = lm_dataset["train"].to_tf_dataset(
@@ -377,6 +371,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate, and some training hyperparameters:
 ```py
@@ -406,6 +406,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/multiple_choice.mdx
+++ b/docs/source/tasks/multiple_choice.mdx
@@ -176,8 +176,10 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
 </tf>
 </frameworkcontent>
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load BERT with [`AutoModelForMultipleChoice`]:
 ```py
@@ -220,18 +222,9 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
-## Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs in `columns`, targets in `label_cols`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs in `columns`, targets in `label_cols`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
@@ -252,6 +245,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 ```py
@@ -284,4 +283,6 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2)
 ```
\ No newline at end of file
+</tf>
+</frameworkcontent>
\ No newline at end of file
--- a/docs/source/tasks/question_answering.mdx
+++ b/docs/source/tasks/question_answering.mdx
@@ -151,8 +151,10 @@ Use [`DefaultDataCollator`] to create a batch of examples. Unlike other data col
 </tf>
 </frameworkcontent>
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load DistilBERT with [`AutoModelForQuestionAnswering`]:
 ```py
@@ -195,18 +197,9 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
-## Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and the start and end positions of an answer in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and the start and end positions of an answer in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> tf_train_set = tokenized_squad["train"].to_tf_dataset(
@@ -226,6 +219,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 ```py
@@ -262,6 +261,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/sequence_classification.mdx
+++ b/docs/source/tasks/sequence_classification.mdx
@@ -91,8 +91,10 @@ Use [`DataCollatorWithPadding`] to create a batch of examples. It will also *dyn
 </tf>
 </frameworkcontent>
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load DistilBERT with [`AutoModelForSequenceClassification`] along with the number of expected labels:
 ```py
@@ -140,18 +142,9 @@ At this point, only three steps remain:
 [`Trainer`] will apply dynamic padding by default when you pass `tokenizer` to it. In this case, you don't need to specify a data collator explicitly.
 </Tip>
+</pt>
-## Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> tf_train_set = tokenized_imdb["train"].to_tf_dataset(
@@ -169,6 +162,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 ```py
@@ -203,6 +202,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/summarization.mdx
+++ b/docs/source/tasks/summarization.mdx
@@ -110,8 +110,10 @@ Use [`DataCollatorForSeq2Seq`] to create a batch of examples. It will also *dyna
 </tf>
 </frameworkcontent>
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load T5 with [`AutoModelForSeq2SeqLM`]:
 ```py
@@ -156,18 +158,9 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
-## Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> tf_train_set = tokenized_billsum["train"].to_tf_dataset(
@@ -185,6 +178,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 ```py
@@ -212,6 +211,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/token_classification.mdx
+++ b/docs/source/tasks/token_classification.mdx
@@ -151,8 +151,10 @@ Use [`DataCollatorForTokenClassification`] to create a batch of examples. It wil
 </tf>
 </frameworkcontent>
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load DistilBERT with [`AutoModelForTokenClassification`] along with the number of expected labels:
 ```py
@@ -195,18 +197,9 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
-## Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> tf_train_set = tokenized_wnut["train"].to_tf_dataset(
@@ -224,6 +217,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 ```py
@@ -261,6 +260,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 <Tip>

--- a/docs/source/tasks/translation.mdx
+++ b/docs/source/tasks/translation.mdx
@@ -112,8 +112,10 @@ Use [`DataCollatorForSeq2Seq`] to create a batch of examples. It will also *dyna
 </tf>
 </frameworkcontent>
-## Fine-tune with Trainer
+## Train
+<frameworkcontent>
+<pt>
 Load T5 with [`AutoModelForSeq2SeqLM`]:
 ```py
@@ -158,18 +160,9 @@ At this point, only three steps remain:
 >>> trainer.train()
 ```
+</pt>
-## Fine-tune with TensorFlow
+<tf>
+To fine-tune a model in TensorFlow, start by converting your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
-To fine-tune a model in TensorFlow is just as easy, with only a few differences.
-<Tip>
-If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](../training#finetune-with-keras)!
-</Tip>
-Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasets.Dataset.to_tf_dataset). Specify inputs and labels in `columns`, whether to shuffle the dataset order, batch size, and the data collator:
 ```py
 >>> tf_train_set = tokenized_books["train"].to_tf_dataset(
@@ -187,6 +180,12 @@ Convert your datasets to the `tf.data.Dataset` format with [`to_tf_dataset`](htt
 ... )
 ```
+<Tip>
+If you aren't familiar with fine-tuning a model with Keras, take a look at the basic tutorial [here](training#finetune-with-keras)!
+</Tip>
 Set up an optimizer function, learning rate schedule, and some training hyperparameters:
 ```py
@@ -214,6 +213,8 @@ Call [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) to fin
 ```py
 >>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 <Tip>

--- a/docs/source/training.mdx
+++ b/docs/source/training.mdx
@@ -63,8 +63,10 @@ If you like, you can create a smaller subset of the full dataset to fine-tune on
 <a id='trainer'></a>
-## Fine-tune with `Trainer`
+## Train
+<frameworkcontent>
+<pt>
 <Youtube id="nvBXf7s7vTI"/>
 🤗 Transformers provides a [`Trainer`] class optimized for training 🤗 Transformers models, making it easier to start training without manually writing your own training loop. The [`Trainer`] API supports a wide range of training options and features such as logging, gradient accumulation, and mixed precision.
@@ -143,14 +145,13 @@ Then fine-tune your model by calling [`~transformers.Trainer.train`]:
 ```py
 >>> trainer.train()
 ```
+</pt>
+<tf>
 <a id='keras'></a>
-## Fine-tune with Keras
 <Youtube id="rnTGBy2ax1c"/>
-🤗 Transformers models also supports training in TensorFlow with the Keras API. You only need to make a few changes before you can fine-tune.
+🤗 Transformers models also supports training in TensorFlow with the Keras API.
 ### Convert dataset to TensorFlow format
@@ -210,11 +211,15 @@ Then compile and fine-tune your model with [`fit`](https://keras.io/api/models/m
 >>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3)
 ```
+</tf>
+</frameworkcontent>
 <a id='pytorch_native'></a>
-## Fine-tune in native PyTorch
+## Train in native PyTorch
+<frameworkcontent>
+<pt>
 <Youtube id="Dh9CL8fyG80"/>
 [`Trainer`] takes care of the training loop and allows you to fine-tune a model in a single line of code. For users who prefer to write their own training loop, you can also fine-tune a 🤗 Transformers model in native PyTorch.
@@ -354,6 +359,8 @@ Just like how you need to add an evaluation function to [`Trainer`], you need to
 >>> metric.compute()
 ```
+</pt>
+</frameworkcontent>
 <a id='additional-resources'></a>