* Optimize fused layer normalization for MI100 * Optimize cuComputePartGradGammaBeta for AMD GPUs
Attach a file by drag & drop or click to upload