[JAX] Fixed the shape miss-matching issue in MLP. (#859)

* Fixed the shape mismatching issue in MLP. Signed-off-by: Ming Huang <mingh@nvidia.com> * Add a corresponding test Signed-off-by: Ming Huang <mingh@nvidia.com> --------- Signed-off-by: Ming Huang <mingh@nvidia.com> Co-authored-by: Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>

[JAX] Fixed the shape miss-matching issue in MLP. (#859)
* Fixed the shape mismatching issue in MLP. Signed-off-by: Ming Huang <mingh@nvidia.com> * Add a corresponding test Signed-off-by: Ming Huang <mingh@nvidia.com> --------- Signed-off-by: Ming Huang <mingh@nvidia.com> Co-authored-by: Phuong Nguyen <36155692+phu0ngng@users.noreply.github.com>
82e5b4d2 · Ming-Xu Huang · GitHub · 01801633 · 82e5b4d2 · 82e5b4d2
Unverified Commit 82e5b4d2 authored May 22, 2024 by Ming-Xu Huang Committed by GitHub May 22, 2024
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

tests/jax/test_layer.py tests/jax/test_layer.py +2 -0

transformer_engine/jax/flax/module.py transformer_engine/jax/flax/module.py +2 -2

No files found.
--- a/tests/jax/test_layer.py
+++ b/tests/jax/test_layer.py
@@ -177,6 +177,8 @@ ATTRS = [{}, {
    _KEY_OF_SELF_ATTN_BIAS_TYPE: "no_bias",
 }, {
    _KEY_OF_ATTENTION_DROPOUT: 0.3,
+}, {
+    _KEY_OF_MLP_ACTIVATIONS: (('relu', 'relu')),
 }]
 ATTRS = [{**BASE_ATTRS, **attr} for attr in ATTRS]

--- a/transformer_engine/jax/flax/module.py
+++ b/transformer_engine/jax/flax/module.py
@@ -1148,8 +1148,8 @@ class LayerNormMLP(TransformerEngineBase):
                    x_i = _convert_to_activation_function(act_fn)(x[idx])
                    activations.append(x_i)
                z = functools.reduce(operator.mul, activations)
-                if num_activations == 1:
+                # Remove act axis
-                    z = jnp.reshape(z, (*z.shape[:-2], -1))
+                z = jnp.reshape(z, (*z.shape[:-2], -1))
            z = nn.Dropout(rate=self.intermediate_dropout_rate,
                           broadcast_dims=self.intermediate_hidden_dropout_dims,