"vscode:/vscode.git/clone" did not exist on "62c01d267a74f1bddfcdad33eabdf316a50fb613"
Commit 8f1f2ca5 authored by Rick Ho's avatar Rick Ho
Browse files

readme in transformer-xl example

parent 59b27103
This directory contains an example based on Zihang Dai, et.al's open-source
transformer [implementation](https://github.com/kimiyoung/transformer-xl) to
demostrate the usage of the usage of Fast MoE's layers.
The code is released with Apache-2.0 license. Here, only the pytorch part of the
code is used, with modification in the `mem_transformer.py` file to enable MoE
training.
## Introduction
This directory contains our pytorch implementation of Transformer-XL. Note that our state-of-the-art results reported in the paper were obtained by training the model on a large-scale TPU cluster, and our pytorch codebase currently does not support distributed training. Here we provide two sets of hyperparameters and scripts:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment