fix CUBLAS guards (#1162)
* support for fused dense layer with cublasLt, fusion in both fprop and bprop
* fix typo causing syntax error
* add fused GEMM+gelu+GEMM modue
* fix typo for workspace size
* update cublas check for 11600
* add tests for fused dense layer
* fix CUDA 10.x path
* safer guard around CUBLAS constants, remove unreferenced variable
* more guard changes
* guard against cublas version instead of cuda
Co-authored-by:
Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
Showing
Please register or sign in to comment