chore: Fix multiple typos (#28574)

5d8eb93e · hugo-syn · GitHub · 81899778 · 5d8eb93e · 5d8eb93e
Unverified Commit 5d8eb93e authored Jan 18, 2024 by hugo-syn Committed by GitHub Jan 18, 2024
5 changed files
--- a/examples/research_projects/codeparrot/README.md
+++ b/examples/research_projects/codeparrot/README.md
@@ -50,7 +50,7 @@ The raw dataset contains many duplicates. We deduplicated and filtered the datas
 - fraction of alphanumeric characters < 0.25
 - containing the word "auto-generated" or similar in the first 5 lines
 - filtering with a probability of 0.7 of files with a mention of "test file" or "configuration file" or similar in the first 5 lines
- filtering with a probability of 0.7 of files with high occurence of the keywords "test " or "config" 
+- filtering with a probability of 0.7 of files with high occurrence of the keywords "test " or "config" 
 - filtering with a probability of 0.7  of files without a mention of the keywords `def` , `for`, `while`  and `class`
 - filtering files that use the assignment operator `=` less than 5 times 
 - filtering files with ratio between number of characters and number of tokens after tokenization < 1.5 (the average ratio is 3.6)

--- a/examples/research_projects/jax-projects/README.md
+++ b/examples/research_projects/jax-projects/README.md
@@ -1153,7 +1153,7 @@ In the following, we will describe how to do so using a standard console, but yo
 2. Once you've installed the google cloud sdk, you should set your account by running the following command. Make sure that `<your-email-address>` corresponds to the gmail address you used to sign up for this event.
 ```bash
-$ gcloud config set account <your-email-adress>
+$ gcloud config set account <your-email-address>
 ```
 3. Let's also make sure the correct project is set in case your email is used for multiple gcloud projects:

--- a/examples/research_projects/jax-projects/big_bird/README.md
+++ b/examples/research_projects/jax-projects/big_bird/README.md
@@ -57,4 +57,4 @@ wget https://huggingface.co/datasets/vasudevgupta/natural-questions-validation/r
 python3 evaluate.py
 ```
-You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repositary](https://github.com/thevasudevgupta/bigbird).
+You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repository](https://github.com/thevasudevgupta/bigbird).
--- a/examples/research_projects/jax-projects/model_parallel/README.md
+++ b/examples/research_projects/jax-projects/model_parallel/README.md
@@ -27,7 +27,7 @@ To adapt the script for other models, we need to also change the `ParitionSpec`
 TODO: Add more explantion.
-Before training, let's prepare our model first. To be able to shard the model, the sharded dimention needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly. 
+Before training, let's prepare our model first. To be able to shard the model, the sharded dimension needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly. 
 ```python
 from transformers import FlaxGPTNeoForCausalLM, GPTNeoConfig 

--- a/examples/research_projects/mlm_wwm/README.md
+++ b/examples/research_projects/mlm_wwm/README.md
@@ -95,4 +95,4 @@ python run_mlm_wwm.py \
 **Note1:** On TPU, you should the flag `--pad_to_max_length` to make sure all your batches have the same length.
-**Note2:** And if you have any questions or something goes wrong when runing this code, don't hesitate to pin @wlhgtc.
+**Note2:** And if you have any questions or something goes wrong when running this code, don't hesitate to pin @wlhgtc.