Unverified Commit ae3cbbca authored by fzyzcjy's avatar fzyzcjy Committed by GitHub
Browse files

Fix tiny typo (#20841)

* Fix typo

* Update README.md

* Update run_mlm_flax_stream.py

* Update README.md
parent 7ef3f19c
...@@ -129,7 +129,7 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl ...@@ -129,7 +129,7 @@ look at [this](https://colab.research.google.com/github/huggingface/notebooks/bl
In the following, we demonstrate how to train an auto-regressive causal transformer model In the following, we demonstrate how to train an auto-regressive causal transformer model
in JAX/Flax. in JAX/Flax.
More specifically, we pretrain a randomely initialized [**`gpt2`**](https://huggingface.co/gpt2) model in Norwegian on a single TPUv3-8. More specifically, we pretrain a randomly initialized [**`gpt2`**](https://huggingface.co/gpt2) model in Norwegian on a single TPUv3-8.
to pre-train 124M [**`gpt2`**](https://huggingface.co/gpt2) to pre-train 124M [**`gpt2`**](https://huggingface.co/gpt2)
in Norwegian on a single TPUv3-8 pod. in Norwegian on a single TPUv3-8 pod.
......
...@@ -710,7 +710,7 @@ class FlaxMLPModel(FlaxMLPPreTrainedModel): ...@@ -710,7 +710,7 @@ class FlaxMLPModel(FlaxMLPPreTrainedModel):
module_class = FlaxMLPModule module_class = FlaxMLPModule
``` ```
Now the `FlaxMLPModel` will have a similar interface as PyTorch or Tensorflow models and allows us to attach loaded or randomely initialized weights to the model instance. Now the `FlaxMLPModel` will have a similar interface as PyTorch or Tensorflow models and allows us to attach loaded or randomly initialized weights to the model instance.
So the important point to remember is that the `model` is not an instance of `nn.Module`; it's an abstract class, like a container that holds a Flax module, its parameters and provides convenient methods for initialization and forward pass. The key take-away here is that an instance of `FlaxMLPModel` is very much stateful now since it holds all the model parameters, whereas the underlying Flax module `FlaxMLPModule` is still stateless. Now to make `FlaxMLPModel` fully compliant with JAX transformations, it is always possible to pass the parameters to `FlaxMLPModel` as well to make it stateless and easier to work with during training. Feel free to take a look at the code to see how exactly this is implemented for ex. [`modeling_flax_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_flax_bert.py#L536) So the important point to remember is that the `model` is not an instance of `nn.Module`; it's an abstract class, like a container that holds a Flax module, its parameters and provides convenient methods for initialization and forward pass. The key take-away here is that an instance of `FlaxMLPModel` is very much stateful now since it holds all the model parameters, whereas the underlying Flax module `FlaxMLPModule` is still stateless. Now to make `FlaxMLPModel` fully compliant with JAX transformations, it is always possible to pass the parameters to `FlaxMLPModel` as well to make it stateless and easier to work with during training. Feel free to take a look at the code to see how exactly this is implemented for ex. [`modeling_flax_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_flax_bert.py#L536)
......
...@@ -562,7 +562,7 @@ if __name__ == "__main__": ...@@ -562,7 +562,7 @@ if __name__ == "__main__":
samples = advance_iter_and_group_samples(training_iter, train_batch_size, max_seq_length) samples = advance_iter_and_group_samples(training_iter, train_batch_size, max_seq_length)
except StopIteration: except StopIteration:
# Once the end of the dataset stream is reached, the training iterator # Once the end of the dataset stream is reached, the training iterator
# is reinitialized and reshuffled and a new eval dataset is randomely chosen. # is reinitialized and reshuffled and a new eval dataset is randomly chosen.
shuffle_seed += 1 shuffle_seed += 1
tokenized_datasets.set_epoch(shuffle_seed) tokenized_datasets.set_epoch(shuffle_seed)
......
...@@ -59,7 +59,7 @@ class BeamSearchTester: ...@@ -59,7 +59,7 @@ class BeamSearchTester:
self.do_early_stopping = do_early_stopping self.do_early_stopping = do_early_stopping
self.num_beam_hyps_to_keep = num_beam_hyps_to_keep self.num_beam_hyps_to_keep = num_beam_hyps_to_keep
# cannot be randomely generated # cannot be randomly generated
self.eos_token_id = vocab_size + 1 self.eos_token_id = vocab_size + 1
def prepare_beam_scorer(self, **kwargs): def prepare_beam_scorer(self, **kwargs):
...@@ -283,7 +283,7 @@ class ConstrainedBeamSearchTester: ...@@ -283,7 +283,7 @@ class ConstrainedBeamSearchTester:
constraints = [PhrasalConstraint(force_tokens), DisjunctiveConstraint(disjunctive_tokens)] constraints = [PhrasalConstraint(force_tokens), DisjunctiveConstraint(disjunctive_tokens)]
self.constraints = constraints self.constraints = constraints
# cannot be randomely generated # cannot be randomly generated
self.eos_token_id = vocab_size + 1 self.eos_token_id = vocab_size + 1
def prepare_constrained_beam_scorer(self, **kwargs): def prepare_constrained_beam_scorer(self, **kwargs):
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment