• Phuong Nguyen's avatar
    [TE/JAX] Prototype for New XLA Custom Calls with FFI (#946) · 4b2b39b4
    Phuong Nguyen authored
    
    
    * implemented custom call with ffi in csrc
    
    * moved headers of misc to misc.h, add ffi.h
    
    * ActLu and DActLu lowering with ffi_lowering
    
    * CastTranspose with ffi_lowering
    
    * enabled cudaGraph
    
    * added 4d input test case to TestActivationLu
    
    * added operand_output_aliases for CastTranspose
    
    * added env var NVTE_JAX_WITH_FFI, default value = 1
    
    * replace casting ActivationEnum by taking its value
    
    ---------
    Signed-off-by: default avatarPhuong Nguyen <phuonguyen@nvidia.com>
    4b2b39b4
transpose.cpp 8.19 KB