"vscode:/vscode.git/clone" did not exist on "1483f22d53a37555ea1a41278cc8cb4b9537991c"
Support for pipeline parallelism in T5 model
- Accumulate encoder hidden state gradient to handle skip connection - Correctly compute the number of layers in encoder / decoder for T5 model - Ensure e weights are initialized the same way in embeddings - Synchronize embedding gradients across encoder and decoder for T5 model - Support for checkpoint loading and saving
Showing
Please register or sign in to comment