This project implements a method for faster and more memory-efficient RNN-T computation, called `pruned rnnt`.
This project implements a method for faster and more memory-efficient RNN-T computation, called `pruned rnnt`.
Note: There is also a fast RNNT loss implementation in [k2](https://github.com/k2-fsa/k2) project, which shares the same code here. We make `fast_rnnt` a stand-alone project in case someone wants only this rnnt loss.
## How does the pruned-rnnt work ?
## How does the pruned-rnnt work ?
We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is just an addition of the encoder and decoder, then we use those pruning bounds to evaluate the full, non-linear joiner network.
We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is just an addition of the encoder and decoder, then we use those pruning bounds to evaluate the full, non-linear joiner network.
...
@@ -214,15 +216,15 @@ loss = fast_rnnt.rnnt_loss(
...
@@ -214,15 +216,15 @@ loss = fast_rnnt.rnnt_loss(
## Benchmarking
## Benchmarking
The [repo](https://github.com/csukuangfj/transducer-loss-benchmarking) compares the speed
The [repo](https://github.com/csukuangfj/transducer-loss-benchmarking) compares the speed and memory usage of several transducer losses, the summary in the following table is taken from there, you can check the repository for more details.
and memory usage of several transducer losses, the summary in the following table is taken
from there, you can check the repository for more details.
Note: As we declare above, `fast_rnnt` also implements in [k2](https://github.com/k2-fsa/k2) project, so `k2` and `fast_rnnt` are equivalent in the benchmarking.
|Name |Average step time (us) | Peak memory usage (MB)|
|Name |Average step time (us) | Peak memory usage (MB)|