Add template for adding flax models (#12441)

* Add option to add flax * Add flax template for __init__.py * Add flax template for .rst * Copy TF modeling template * Add a missing line in modeling_tf_... template * Update first half of modeling_flax_.. * Update encoder flax template * Copy test_modeling_tf... as test_modeling_flax... * Replace some TF to Flax in test_modeling_flax_... * Replace tf to np some function might not work, like _assert_tensors_equal * Replace remaining tf to np (might not work) * Fix cookiecutter * Add Flax in to_replace_... template * Update transformers-cli add-new-model * Save generate_flax in configuration.json This will be read by transformers-cli * Fix to_replace_... and cli * Fix replace cli * Fix cookiecutter name * Move docstring earlier to avoid not defined error * Fix a missing Module * Add encoder-decoder flax template from bart * Fix flax test * Make style * Fix endif * Fix replace all "utf-8 -> unp-8" * Update comment * Fix flax template (add missing ..._DOCSTRING) * Use flax_bart imports in template (was t5) * Fix unp * Update templates/adding_a_new_model/tests * Revert "Fix unp" This reverts commit dc9002a41d902c4f9b07343eab1cb350c8b7fd57. * Remove one line of copied from to suppress CI error * Use generate_tensorflow_pytorch_and_flax * Add a missing part * fix typo * fix flax config * add examples for flax * small rename * correct modeling imports * correct auto loading * corrects some flax tests * correct small typo * correct as type * finish modif * correct more templates * final fixes * add file testers * up * make sure tests match template regex * correct pytorch * correct tf * correct more tf * correct imports * minor error * minor error * correct init * more fixes * correct more flax tests * correct flax test * more fixes * correct docs * update * fix Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

Add template for adding flax models (#12441)
* Add option to add flax * Add flax template for __init__.py * Add flax template for .rst * Copy TF modeling template * Add a missing line in modeling_tf_... template * Update first half of modeling_flax_.. * Update encoder flax template * Copy test_modeling_tf... as test_modeling_flax... * Replace some TF to Flax in test_modeling_flax_... * Replace tf to np some function might not work, like _assert_tensors_equal * Replace remaining tf to np (might not work) * Fix cookiecutter * Add Flax in to_replace_... template * Update transformers-cli add-new-model * Save generate_flax in configuration.json This will be read by transformers-cli * Fix to_replace_... and cli * Fix replace cli * Fix cookiecutter name * Move docstring earlier to avoid not defined error * Fix a missing Module * Add encoder-decoder flax template from bart * Fix flax test * Make style * Fix endif * Fix replace all "utf-8 -> unp-8" * Update comment * Fix flax template (add missing ..._DOCSTRING) * Use flax_bart imports in template (was t5) * Fix unp * Update templates/adding_a_new_model/tests * Revert "Fix unp" This reverts commit dc9002a41d902c4f9b07343eab1cb350c8b7fd57. * Remove one line of copied from to suppress CI error * Use generate_tensorflow_pytorch_and_flax * Add a missing part * fix typo * fix flax config * add examples for flax * small rename * correct modeling imports * correct auto loading * corrects some flax tests * correct small typo * correct as type * finish modif * correct more templates * final fixes * add file testers * up * make sure tests match template regex * correct pytorch * correct tf * correct more tf * correct imports * minor error * minor error * correct init * more fixes * correct more flax tests * correct flax test * more fixes * correct docs * update * fix Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
d160782a · Jonathan Chang · GitHub · 8e208878 · d160782a · d160782a
Unverified Commit d160782a authored Sep 01, 2021 by Jonathan Chang Committed by GitHub Sep 01, 2021
20 changed files
--- a/.github/workflows/model-templates.yml
+++ b/.github/workflows/model-templates.yml
@@ -47,6 +47,8 @@ jobs:
          transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
          transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/tf-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
          transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/pt-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
+          transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/flax-encoder-bert-tokenizer.json --path=templates/adding_a_new_model
+          transformers-cli add-new-model --testing --testing_file=templates/adding_a_new_model/tests/flax-seq-2-seq-bart-tokenizer.json --path=templates/adding_a_new_model
          make style
          python utils/check_table.py --fix_and_overwrite
          python utils/check_dummies.py --fix_and_overwrite

--- a/src/transformers/__init__.py
+++ b/src/transformers/__init__.py
@@ -1702,6 +1702,8 @@ if is_flax_available():
            "FlaxAutoModelForTokenClassification",
        ]
    )
+    # Flax models structure
    _import_structure["models.bart"].extend(
        [
            "FlaxBartForConditionalGeneration",

--- a/src/transformers/commands/add_new_model.py
+++ b/src/transformers/commands/add_new_model.py
@@ -93,11 +93,12 @@ class AddNewModelCommand(BaseTransformersCLICommand):
            configuration = json.load(configuration_file)
        lowercase_model_name = configuration["lowercase_modelname"]
-        pytorch_or_tensorflow = configuration["generate_tensorflow_and_pytorch"]
+        generate_tensorflow_pytorch_and_flax = configuration["generate_tensorflow_pytorch_and_flax"]
        os.remove(f"{directory}/configuration.json")
-        output_pytorch = "PyTorch" in pytorch_or_tensorflow
+        output_pytorch = "PyTorch" in generate_tensorflow_pytorch_and_flax
-        output_tensorflow = "TensorFlow" in pytorch_or_tensorflow
+        output_tensorflow = "TensorFlow" in generate_tensorflow_pytorch_and_flax
+        output_flax = "Flax" in generate_tensorflow_pytorch_and_flax
        model_dir = f"{path_to_transformer_root}/src/transformers/models/{lowercase_model_name}"
        os.makedirs(model_dir, exist_ok=True)
@@ -153,6 +154,23 @@ class AddNewModelCommand(BaseTransformersCLICommand):
            os.remove(f"{directory}/modeling_tf_{lowercase_model_name}.py")
            os.remove(f"{directory}/test_modeling_tf_{lowercase_model_name}.py")
+        if output_flax:
+            if not self._testing:
+                remove_copy_lines(f"{directory}/modeling_flax_{lowercase_model_name}.py")
+            shutil.move(
+                f"{directory}/modeling_flax_{lowercase_model_name}.py",
+                f"{model_dir}/modeling_flax_{lowercase_model_name}.py",
+            )
+            shutil.move(
+                f"{directory}/test_modeling_flax_{lowercase_model_name}.py",
+                f"{path_to_transformer_root}/tests/test_modeling_flax_{lowercase_model_name}.py",
+            )
+        else:
+            os.remove(f"{directory}/modeling_flax_{lowercase_model_name}.py")
+            os.remove(f"{directory}/test_modeling_flax_{lowercase_model_name}.py")
        shutil.move(
            f"{directory}/{lowercase_model_name}.rst",
            f"{path_to_transformer_root}/docs/source/model_doc/{lowercase_model_name}.rst",
@@ -196,8 +214,10 @@ class AddNewModelCommand(BaseTransformersCLICommand):
            move(abs_path, original_file)
        def skip_units(line):
-            return ("generating PyTorch" in line and not output_pytorch) or (
+            return (
-                "generating TensorFlow" in line and not output_tensorflow
+                ("generating PyTorch" in line and not output_pytorch)
+                or ("generating TensorFlow" in line and not output_tensorflow)
+                or ("generating Flax" in line and not output_flax)
            )
        def replace_in_files(path_to_datafile):

--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/__init__.py
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/__init__.py
@@ -17,13 +17,18 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
-{%- if cookiecutter.generate_tensorflow_and_pytorch == "PyTorch & TensorFlow" %}
+# rely on isort to merge the imports
-from ...file_utils import _LazyModule, is_tf_available, is_torch_available, is_tokenizers_available
+from ...file_utils import _LazyModule, is_tokenizers_available
-{%- elif cookiecutter.generate_tensorflow_and_pytorch == "PyTorch" %}
+{%- if "TensorFlow" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
-from ...file_utils import _LazyModule, is_torch_available, is_tokenizers_available
+from ...file_utils import is_tf_available
-{%- elif cookiecutter.generate_tensorflow_and_pytorch == "TensorFlow" %}
-from ...file_utils import _LazyModule, is_tf_available, is_tokenizers_available
 {% endif %}
+{%- if "PyTorch" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
+from ...file_utils import is_torch_available
+{% endif %}
+{%- if "Flax" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
+from ...file_utils import is_flax_available
+{% endif %}
 _import_structure = {
    "configuration_{{cookiecutter.lowercase_modelname}}": ["{{cookiecutter.uppercase_modelname}}_PRETRAINED_CONFIG_ARCHIVE_MAP", "{{cookiecutter.camelcase_modelname}}Config"],
    "tokenization_{{cookiecutter.lowercase_modelname}}": ["{{cookiecutter.camelcase_modelname}}Tokenizer"],
@@ -32,7 +37,7 @@ _import_structure = {
 if is_tokenizers_available():
    _import_structure["tokenization_{{cookiecutter.lowercase_modelname}}_fast"] = ["{{cookiecutter.camelcase_modelname}}TokenizerFast"]
-{%- if (cookiecutter.generate_tensorflow_and_pytorch == "PyTorch & TensorFlow" or cookiecutter.generate_tensorflow_and_pytorch == "PyTorch") %}
+{%- if "PyTorch" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
 {% if cookiecutter.is_encoder_decoder_model == "False" %}
 if is_torch_available():
    _import_structure["modeling_{{cookiecutter.lowercase_modelname}}"] = [
@@ -61,7 +66,9 @@ if is_torch_available():
    ]
 {% endif %}
 {% endif %}
-{%- if (cookiecutter.generate_tensorflow_and_pytorch == "PyTorch & TensorFlow" or cookiecutter.generate_tensorflow_and_pytorch == "TensorFlow") %}
+{%- if "TensorFlow" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
 {% if cookiecutter.is_encoder_decoder_model == "False" %}
 if is_tf_available():
    _import_structure["modeling_tf_{{cookiecutter.lowercase_modelname}}"] = [
@@ -87,6 +94,33 @@ if is_tf_available():
 {% endif %}
+{%- if "Flax" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
+{% if cookiecutter.is_encoder_decoder_model == "False" %}
+if is_flax_available():
+    _import_structure["modeling_flax_{{cookiecutter.lowercase_modelname}}"] = [
+        "Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM",
+        "Flax{{cookiecutter.camelcase_modelname}}ForCausalLM",
+        "Flax{{cookiecutter.camelcase_modelname}}ForMultipleChoice",
+        "Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering",
+        "Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification",
+        "Flax{{cookiecutter.camelcase_modelname}}ForTokenClassification",
+        "Flax{{cookiecutter.camelcase_modelname}}Layer",
+        "Flax{{cookiecutter.camelcase_modelname}}Model",
+        "Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel",
+    ]
+{% else %}
+if is_flax_available():
+    _import_structure["modeling_flax_{{cookiecutter.lowercase_modelname}}"] = [
+        "Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration",
+        "Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering",
+        "Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification",
+        "Flax{{cookiecutter.camelcase_modelname}}Model",
+        "Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel",
+    ]
+{% endif %}
+{% endif %}
 if TYPE_CHECKING:
    from .configuration_{{cookiecutter.lowercase_modelname}} import {{cookiecutter.uppercase_modelname}}_PRETRAINED_CONFIG_ARCHIVE_MAP, {{cookiecutter.camelcase_modelname}}Config
    from .tokenization_{{cookiecutter.lowercase_modelname}} import {{cookiecutter.camelcase_modelname}}Tokenizer
@@ -94,7 +128,7 @@ if TYPE_CHECKING:
    if is_tokenizers_available():
        from .tokenization_{{cookiecutter.lowercase_modelname}}_fast import {{cookiecutter.camelcase_modelname}}TokenizerFast
-{%- if (cookiecutter.generate_tensorflow_and_pytorch == "PyTorch & TensorFlow" or cookiecutter.generate_tensorflow_and_pytorch == "PyTorch") %}
+{%- if "PyTorch" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
 {% if cookiecutter.is_encoder_decoder_model == "False" %}
    if is_torch_available():
        from .modeling_{{cookiecutter.lowercase_modelname}} import (
@@ -123,7 +157,7 @@ if TYPE_CHECKING:
        )
 {% endif %}
 {% endif %}
-{%- if (cookiecutter.generate_tensorflow_and_pytorch == "PyTorch & TensorFlow" or cookiecutter.generate_tensorflow_and_pytorch == "TensorFlow") %}
+{%- if "TensorFlow" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
 {% if cookiecutter.is_encoder_decoder_model == "False" %}
    if is_tf_available():
        from .modeling_tf_{{cookiecutter.lowercase_modelname}} import (
@@ -147,6 +181,32 @@ if TYPE_CHECKING:
        )
 {% endif %}
 {% endif %}
+{%- if "Flax" in cookiecutter.generate_tensorflow_pytorch_and_flax %}
+{% if cookiecutter.is_encoder_decoder_model == "False" %}
+    if is_flax_available():
+        from .modeling_{{cookiecutter.lowercase_modelname}} import (
+            Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM,
+            Flax{{cookiecutter.camelcase_modelname}}ForCausalLM,
+            Flax{{cookiecutter.camelcase_modelname}}ForMultipleChoice,
+            Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering,
+            Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification,
+            Flax{{cookiecutter.camelcase_modelname}}ForTokenClassification,
+            Flax{{cookiecutter.camelcase_modelname}}Layer,
+            Flax{{cookiecutter.camelcase_modelname}}Model,
+            Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel,
+        )
+{% else %}
+    if is_flax_available():
+        from .modeling_{{cookiecutter.lowercase_modelname}} import (
+            Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration,
+            Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering,
+            Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification,
+            Flax{{cookiecutter.camelcase_modelname}}Model,
+            Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel,
+        )
+{% endif %}
+{% endif %}
 else:
    import sys

--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration.json
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/configuration.json
@@ -6,6 +6,6 @@
  "authors": "{{cookiecutter.authors}}",
  "checkpoint_identifier": "{{cookiecutter.checkpoint_identifier}}",
  "tokenizer_type": "{{cookiecutter.tokenizer_type}}",
-  "generate_tensorflow_and_pytorch": "{{cookiecutter.generate_tensorflow_and_pytorch}}",
+  "generate_tensorflow_pytorch_and_flax": "{{cookiecutter.generate_tensorflow_pytorch_and_flax}}",
-  "is_encoder_decoder_model": ["True", "False"]
+  "is_encoder_decoder_model": "{{cookiecutter.is_encoder_decoder_model}}"
 }
--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_flax_{{cookiecutter.lowercase_modelname}}.py
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_flax_{{cookiecutter.lowercase_modelname}}.py
--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_tf_{{cookiecutter.lowercase_modelname}}.py
@@ -321,6 +321,7 @@ class TF{{cookiecutter.camelcase_modelname}}Output(tf.keras.layers.Layer):
        return hidden_states
+# Copied from transformers.models.bert.modeling_tf_bert.TFBertLayer with Bert->{{cookiecutter.camelcase_modelname}}
 class TF{{cookiecutter.camelcase_modelname}}Layer(tf.keras.layers.Layer):
    def __init__(self, config: {{cookiecutter.camelcase_modelname}}Config, **kwargs):
        super().__init__(**kwargs)
@@ -1615,6 +1616,7 @@ class TF{{cookiecutter.camelcase_modelname}}Attention(tf.keras.layers.Layer):
        key_value_states: Optional[tf.Tensor] = None,
        past_key_value: Optional[Tuple[Tuple[tf.Tensor]]] = None,
        attention_mask: Optional[tf.Tensor] = None,
+        layer_head_mask: Optional[tf.Tensor] = None,
        training=False,
    ) -> Tuple[tf.Tensor, Optional[tf.Tensor]]:
        """Input shape: Batch x Time x Channel"""
@@ -1688,6 +1690,21 @@ class TF{{cookiecutter.camelcase_modelname}}Attention(tf.keras.layers.Layer):
        attn_weights = tf.nn.softmax(attn_weights, axis=-1)
+        if layer_head_mask is not None:
+            # The tf.debugging asserts are not compliant with XLA then they
+            # have to be disabled in other modes than eager.
+            if tf.executing_eagerly():
+                tf.debugging.assert_equal(
+                    shape_list(layer_head_mask),
+                    [self.num_heads],
+                    message=f"Head mask for a single layer should be of size {(self.num_heads)}, but is {shape_list(layer_head_mask)}",
+                )
+            attn_weights = tf.reshape(layer_head_mask, (1, -1, 1, 1)) * tf.reshape(
+                attn_weights, (bsz, self.num_heads, tgt_len, src_len)
+            )
+            attn_weights = tf.reshape(attn_weights, (bsz * self.num_heads, tgt_len, src_len))
        attn_probs = self.dropout(attn_weights, training=training)
        attn_output = tf.matmul(attn_probs, value_states)
@@ -1868,7 +1885,7 @@ class TF{{cookiecutter.camelcase_modelname}}DecoderLayer(tf.keras.layers.Layer):
        return (
            hidden_states,
            self_attn_weights,
-            cross_attn_layer_head_mask,
+            cross_attn_weights,
            present_key_value,
        )
@@ -2136,7 +2153,7 @@ class TF{{cookiecutter.camelcase_modelname}}Encoder(tf.keras.layers.Layer):
            raise ValueError("You have to specify either input_ids or inputs_embeds")
        if inputs["inputs_embeds"] is None:
-            inputs_embeds = self.embed_tokens(inputs["input_ids"]) * self.embed_scale
+            inputs["inputs_embeds"] = self.embed_tokens(inputs["input_ids"]) * self.embed_scale
        embed_pos = self.embed_positions(input_shape)
        hidden_states = inputs["inputs_embeds"] + embed_pos
@@ -2865,7 +2882,17 @@ class TF{{cookiecutter.camelcase_modelname}}ForConditionalGeneration(TF{{cookiec
            encoder_attentions=enc_attns,
        )
-    def prepare_inputs_for_generation(self, decoder_input_ids, past, attention_mask, use_cache, **kwargs) -> Dict:
+    def prepare_inputs_for_generation(
+        self,
+        decoder_input_ids,
+        past, 
+        attention_mask, 
+        head_mask=None,
+        decoder_head_mask=None,
+        cross_attn_head_mask=None,
+        use_cache=False, 
+        **kwargs
+    ) -> Dict:
        assert past is not None and len(past) in {1, 2}, f"past has to be an iterable of length 1,2 got {past}"
        if len(past) == 1:
            assert isinstance(past[0], tf.Tensor), f"`past[0]` has to be of type `tf.Tensor`, but is {type(past[0])}"
@@ -2897,6 +2924,9 @@ class TF{{cookiecutter.camelcase_modelname}}ForConditionalGeneration(TF{{cookiec
            "past_key_values": past_key_values,
            "decoder_input_ids": decoder_input_ids,
            "attention_mask": attention_mask,
+            "head_mask": head_mask,
+            "decoder_head_mask": decoder_head_mask,
+            "cross_attn_head_mask": cross_attn_head_mask,
            "use_cache": use_cache,  # change this to avoid caching (presumably for debugging)
        }

--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/modeling_{{cookiecutter.lowercase_modelname}}.py
@@ -2867,6 +2867,8 @@ class {{cookiecutter.camelcase_modelname}}ForConditionalGeneration({{cookiecutte
        past=None,
        attention_mask=None,
        head_mask=None,
+        decoder_head_mask=None,
+        cross_attn_head_mask=None,
        use_cache=None,
        encoder_outputs=None,
        **kwargs
@@ -2882,6 +2884,8 @@ class {{cookiecutter.camelcase_modelname}}ForConditionalGeneration({{cookiecutte
            "decoder_input_ids": decoder_input_ids,
            "attention_mask": attention_mask,
            "head_mask": head_mask,
+            "decoder_head_mask": decoder_head_mask,
+            "cross_attn_head_mask": cross_attn_head_mask,
            "use_cache": use_cache,  # change this to avoid caching (presumably for debugging)
        }

--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_flax_{{cookiecutter.lowercase_modelname}}.py
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/test_modeling_flax_{{cookiecutter.lowercase_modelname}}.py
--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/to_replace_{{cookiecutter.lowercase_modelname}}.py
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/to_replace_{{cookiecutter.lowercase_modelname}}.py
@@ -86,6 +86,35 @@
 {% endif -%}
 # End.
+# Below: "    # Flax models structure" if generating Flax
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" %}
+    _import_structure["models.{{cookiecutter.lowercase_modelname}}"].extend(
+        [
+            "Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM",
+            "Flax{{cookiecutter.camelcase_modelname}}ForCausalLM",
+            "Flax{{cookiecutter.camelcase_modelname}}ForMultipleChoice",
+            "Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering",
+            "Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification",
+            "Flax{{cookiecutter.camelcase_modelname}}ForTokenClassification",
+            "Flax{{cookiecutter.camelcase_modelname}}Layer",
+            "Flax{{cookiecutter.camelcase_modelname}}Model",
+            "Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel",
+        ]
+    )
+{% else %}
+    _import_structure["models.{{cookiecutter.lowercase_modelname}}"].extend(
+        [
+            "Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration",
+            "Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering",
+            "Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification",
+            "Flax{{cookiecutter.camelcase_modelname}}Model",
+            "Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel",
+        ]
+    )
+{% endif -%}
+# End.
 # Below: "    # Fast tokenizers"
 # Replace with:
    _import_structure["models.{{cookiecutter.lowercase_modelname}}"].append("{{cookiecutter.camelcase_modelname}}TokenizerFast")
@@ -150,6 +179,31 @@
 {% endif -%}
 # End.
+# Below: "    if is_flax_available():" if generating Flax
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" %}
+        from .models.{{cookiecutter.lowercase_modelname}} import (
+            Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM,
+            Flax{{cookiecutter.camelcase_modelname}}ForCausalLM,
+            Flax{{cookiecutter.camelcase_modelname}}ForMultipleChoice,
+            Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering,
+            Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification,
+            Flax{{cookiecutter.camelcase_modelname}}ForTokenClassification,
+            Flax{{cookiecutter.camelcase_modelname}}Layer,
+            Flax{{cookiecutter.camelcase_modelname}}Model,
+            Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel,
+        )
+{% else %}
+        from .models.{{cookiecutter.lowercase_modelname}} import (
+            Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration,
+            Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering,
+            Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification,
+            Flax{{cookiecutter.camelcase_modelname}}Model,
+            Flax{{cookiecutter.camelcase_modelname}}PreTrainedModel,
+        )
+{% endif -%}
+# End.
 # Below: "    if is_tokenizers_available():"
 # Replace with:
        from .models.{{cookiecutter.lowercase_modelname}} import {{cookiecutter.camelcase_modelname}}TokenizerFast
@@ -320,6 +374,81 @@
 {% endif -%}
 # End.
+# To replace in: "src/transformers/models/auto/modeling_flax_auto.py" if generating Flax
+# Below: "# Base model mapping"
+# Replace with:
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}Model"),
+# End.
+# Below: "# Model for Masked LM mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM"),
+{% else %}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration"),
+{% endif -%}
+# End.
+# Below: "# Model for Causal LM mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForCausalLM"),
+{% else -%}
+{% endif -%}
+# End.
+# Below: "# Model for Masked LM mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM"),
+{% else -%}
+{% endif -%}
+# End.
+# Below: "# Model for Sequence Classification mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification"),
+{% else %}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification"),
+{% endif -%}
+# End.
+# Below: "# Model for Question Answering mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering"),
+{% else %}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering"),
+{% endif -%}
+# End.
+# Below: "# Model for Token Classification mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForTokenClassification"),
+{% else -%}
+{% endif -%}
+# End.
+# Below: "# Model for Multiple Choice mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForMultipleChoice"),
+{% else -%}
+{% endif -%}
+# End.
+# Below: "# Model for Seq2Seq Causal LM mapping"
+# Replace with:
+{% if cookiecutter.is_encoder_decoder_model == "False" -%}
+{% else %}
+        ("{{cookiecutter.lowercase_modelname}}", "Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration"),
+{% endif -%}
+# End.
 # To replace in: "utils/check_repo.py" if generating PyTorch
 # Below: "models to ignore for model xxx mapping"

--- a/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/{{cookiecutter.lowercase_modelname}}.rst
+++ b/templates/adding_a_new_model/cookiecutter-template-{{cookiecutter.modelname}}/{{cookiecutter.lowercase_modelname}}.rst
@@ -53,7 +53,7 @@ This model was contributed by `<INSERT YOUR HF USERNAME HERE>
    :members:
-{% if "PyTorch" in cookiecutter.generate_tensorflow_and_pytorch -%}
+{% if "PyTorch" in cookiecutter.generate_tensorflow_pytorch_and_flax -%}
 {{cookiecutter.camelcase_modelname}}Model
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -133,7 +133,7 @@ This model was contributed by `<INSERT YOUR HF USERNAME HERE>
 {% endif -%}
 {% endif -%}
-{% if "TensorFlow" in cookiecutter.generate_tensorflow_and_pytorch -%}
+{% if "TensorFlow" in cookiecutter.generate_tensorflow_pytorch_and_flax -%}
 TF{{cookiecutter.camelcase_modelname}}Model
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -194,3 +194,79 @@ TF{{cookiecutter.camelcase_modelname}}ForConditionalGeneration
 {% endif -%}
 {% endif -%}
+{% if "Flax" in cookiecutter.generate_tensorflow_pytorch_and_flax -%}
+Flax{{cookiecutter.camelcase_modelname}}Model
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}Model
+    :members: call
+{% if cookiecutter.is_encoder_decoder_model == "False" %}
+Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForMaskedLM
+    :members: call
+Flax{{cookiecutter.camelcase_modelname}}ForCausalLM
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForCausalLM
+    :members: call
+Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification
+    :members: call
+Flax{{cookiecutter.camelcase_modelname}}ForMultipleChoice
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForMultipleChoice
+    :members: call
+Flax{{cookiecutter.camelcase_modelname}}ForTokenClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForTokenClassification
+    :members: call
+Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering
+    :members: call
+{%- else %}
+Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForSequenceClassification
+    :members: call
+Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForQuestionAnswering
+    :members: call
+Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. autoclass:: transformers.Flax{{cookiecutter.camelcase_modelname}}ForConditionalGeneration
+    :members: call
+{% endif -%}
+{% endif -%}
--- a/templates/adding_a_new_model/cookiecutter.json
+++ b/templates/adding_a_new_model/cookiecutter.json
@@ -6,6 +6,14 @@
  "authors": "The HuggingFace Team",
  "checkpoint_identifier": "brand-new-bert-base-cased",
  "tokenizer_type": ["Based on BERT", "Based on BART", "Standalone"],
-  "generate_tensorflow_and_pytorch": ["PyTorch & TensorFlow", "PyTorch", "TensorFlow"],
+  "generate_tensorflow_pytorch_and_flax": [
+    "PyTorch, TensorFlow and Flax",
+    "PyTorch & TensorFlow",
+    "PyTorch & Flax",
+    "TensorFlow & Flax",
+    "PyTorch",
+    "TensorFlow",
+    "Flax"
+  ],
  "is_encoder_decoder_model": ["True", "False"]
 }
--- a/templates/adding_a_new_model/tests/encoder-bert-tokenizer.json
+++ b/templates/adding_a_new_model/tests/encoder-bert-tokenizer.json
@@ -6,6 +6,6 @@
  "authors": "The HuggingFace Team",
  "checkpoint_identifier": "brand-new-bert-base-cased",
  "tokenizer_type": "Based on BERT",
-  "generate_tensorflow_and_pytorch": "PyTorch & TensorFlow",
+  "generate_tensorflow_pytorch_and_flax": "PyTorch, TensorFlow and Flax",
  "is_encoder_decoder_model": "False"
 }
--- a/templates/adding_a_new_model/tests/flax-encoder-bert-tokenizer.json
+++ b/templates/adding_a_new_model/tests/flax-encoder-bert-tokenizer.json
+{
+  "modelname": "TemplateFLAX",
+  "uppercase_modelname": "TEMPLATE_FLAX",
+  "lowercase_modelname": "template_flax",
+  "camelcase_modelname": "TemplateFlax",
+  "authors": "The HuggingFace Team",
+  "checkpoint_identifier": "brand-new-bert-base-cased",
+  "tokenizer_type": "Based on BERT",
+  "generate_tensorflow_pytorch_and_flax": "Flax",
+  "is_encoder_decoder_model": "False"
+}
--- a/templates/adding_a_new_model/tests/flax-seq-2-seq-bart-tokenizer.json
+++ b/templates/adding_a_new_model/tests/flax-seq-2-seq-bart-tokenizer.json
+{
+  "modelname": "FlaxNewENCDEC",
+  "uppercase_modelname": "FLAX_NEW_ENC_DEC",
+  "lowercase_modelname": "flax_new_enc_dec_template",
+  "camelcase_modelname": "FlaxNewEncDec",
+  "authors": "The HuggingFace Team",
+  "checkpoint_identifier": "new-flax-enc-dec-base",
+  "tokenizer_type": "Based on BART",
+  "generate_tensorflow_pytorch_and_flax": "Flax",
+  "is_encoder_decoder_model": "True"
+}
--- a/templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json
+++ b/templates/adding_a_new_model/tests/pt-encoder-bert-tokenizer.json
@@ -6,6 +6,6 @@
  "authors": "The HuggingFace Team",
  "checkpoint_identifier": "brand-new-bert-base-cased",
  "tokenizer_type": "Based on BERT",
-  "generate_tensorflow_and_pytorch": "PyTorch",
+  "generate_tensorflow_pytorch_and_flax": "PyTorch",
  "is_encoder_decoder_model": "False"
 }
--- a/templates/adding_a_new_model/tests/pt-seq-2-seq-bart-tokenizer.json
+++ b/templates/adding_a_new_model/tests/pt-seq-2-seq-bart-tokenizer.json
 {
-  "modelname": "NewENCDEC",
+  "modelname": "PTNewENCDEC",
-  "uppercase_modelname": "NEW_ENC_DEC",
+  "uppercase_modelname": "PT_NEW_ENC_DEC",
-  "lowercase_modelname": "new_enc_dec",
+  "lowercase_modelname": "pt_new_enc_dec_template",
-  "camelcase_modelname": "NewEncDec",
+  "camelcase_modelname": "PtNewEncDec",
  "authors": "The HuggingFace Team",
-  "checkpoint_identifier": "new-enc-dec-base",
+  "checkpoint_identifier": "pt-new-enc-dec-base",
  "tokenizer_type": "Based on BART",
-  "generate_tensorflow_and_pytorch": "PyTorch",
+  "generate_tensorflow_pytorch_and_flax": "PyTorch",
  "is_encoder_decoder_model": "True"
 }
--- a/templates/adding_a_new_model/tests/standalone.json
+++ b/templates/adding_a_new_model/tests/standalone.json
@@ -6,6 +6,6 @@
  "authors": "The HuggingFace Team",
  "checkpoint_identifier": "bi-brand-new-bert-base-cased",
  "tokenizer_type": "Standalone",
-  "generate_tensorflow_and_pytorch": "PyTorch & TensorFlow",
+  "generate_tensorflow_pytorch_and_flax": "PyTorch, TensorFlow and Flax",
  "is_encoder_decoder_model": "False"
 }
--- a/templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json
+++ b/templates/adding_a_new_model/tests/tf-encoder-bert-tokenizer.json
@@ -6,6 +6,6 @@
  "authors": "The HuggingFace Team",
  "checkpoint_identifier": "brand-new-bert-base-cased",
  "tokenizer_type": "Based on BERT",
-  "generate_tensorflow_and_pytorch": "TensorFlow",
+  "generate_tensorflow_pytorch_and_flax": "TensorFlow",
  "is_encoder_decoder_model": "False"
 }
--- a/templates/adding_a_new_model/tests/tf-seq-2-seq-bart-tokenizer.json
+++ b/templates/adding_a_new_model/tests/tf-seq-2-seq-bart-tokenizer.json
 {
  "modelname": "NewTFENCDEC",
  "uppercase_modelname": "NEW_TF_ENC_DEC",
-  "lowercase_modelname": "new_tf_enc_dec",
+  "lowercase_modelname": "new_tf_enc_dec_template",
  "camelcase_modelname": "NewTFEncDec",
  "authors": "The HuggingFace Team",
-  "checkpoint_identifier": "new-tf-enc-dec-base",
+  "checkpoint_identifier": "new-tf-enc-dec-base_template",
  "tokenizer_type": "Based on BART",
-  "generate_tensorflow_and_pytorch": "TensorFlow",
+  "generate_tensorflow_pytorch_and_flax": "TensorFlow",
  "is_encoder_decoder_model": "True"
 }