• Chaitanya Sri Krishna Lolla's avatar
    [Upstream] IFU 05072020 (#4) · e85a1d4b
    Chaitanya Sri Krishna Lolla authored
    
    
    * fix dropout scaling from p to 1/(1-p) (#816)
    Co-authored-by: default avatarSukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
    
    * Improvements to apex.mlp (#804)
    
    * update fused bias relu backward kernel
    
    * adding support for not require first layer dgrad
    
    * fix bug: wrong layer in requires grad
    
    * add infrastructure for optional bias and activation, currently only support no bias and no relu
    
    * make bias and relu optional separately
    
    * add sigmoid activation option
    
    * enable wider load/store for multi_tensor_apply kernels (#763)
    
    * modify MTA axpby for wider load/store
    
    * Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
    
    * Changes to make xentropysoftmax load/store vectorized when possible: (#725)
    
    * Changes to make xentropysoftmax load/store vectorized when possible:
    Increase default ILP so that each thread handle 16 Bytes data in one step
    Make thread load/store longest vector possible
    Make unroll case handle adjacent data instead of strided...
    e85a1d4b
multi_tensor_l2norm_kernel.cu 12 KB