@@ -1153,7 +1153,7 @@ In the following, we will describe how to do so using a standard console, but yo
...
@@ -1153,7 +1153,7 @@ In the following, we will describe how to do so using a standard console, but yo
2. Once you've installed the google cloud sdk, you should set your account by running the following command. Make sure that `<your-email-address>` corresponds to the gmail address you used to sign up for this event.
2. Once you've installed the google cloud sdk, you should set your account by running the following command. Make sure that `<your-email-address>` corresponds to the gmail address you used to sign up for this event.
```bash
```bash
$ gcloud config set account <your-email-adress>
$ gcloud config set account <your-email-address>
```
```
3. Let's also make sure the correct project is set in case your email is used for multiple gcloud projects:
3. Let's also make sure the correct project is set in case your email is used for multiple gcloud projects:
You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repositary](https://github.com/thevasudevgupta/bigbird).
You can find our checkpoint on HuggingFace Hub ([see this](https://huggingface.co/vasudevgupta/flax-bigbird-natural-questions)). In case you are interested in PyTorch BigBird fine-tuning, you can refer to [this repository](https://github.com/thevasudevgupta/bigbird).
@@ -27,7 +27,7 @@ To adapt the script for other models, we need to also change the `ParitionSpec`
...
@@ -27,7 +27,7 @@ To adapt the script for other models, we need to also change the `ParitionSpec`
TODO: Add more explantion.
TODO: Add more explantion.
Before training, let's prepare our model first. To be able to shard the model, the sharded dimention needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.
Before training, let's prepare our model first. To be able to shard the model, the sharded dimension needs to be a multiple of devices it'll be sharded on. But GPTNeo's vocab size is 50257, so we need to resize the embeddings accordingly.