- 12 Jun, 2023 1 commit
-
-
flyingdown authored
2.添加环境变量APEX_ROCBLAS_GEMM_ALLOW_HALF用于控制是否使用fp16r 3.添加dcu版本信息 whl包名修改 readme更新安装步骤
-
- 14 Nov, 2022 1 commit
-
-
flyingdown authored
-
- 19 Oct, 2021 1 commit
-
-
Hubert Lu authored
-
- 04 Sep, 2021 1 commit
-
-
Burc Eryilmaz authored
* support for fused dense layer with cublasLt, fusion in both fprop and bprop * fix typo causing syntax error * add fused GEMM+gelu+GEMM modue * fix typo for workspace size * update cublas check for 11600 * add tests for fused dense layer * fix CUDA 10.x path * safer guard around CUBLAS constants, remove unreferenced variable * more guard changes * guard against cublas version instead of cuda Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
-
- 01 Sep, 2021 1 commit
-
-
Burc Eryilmaz authored
* support for fused dense layer with cublasLt, fusion in both fprop and bprop * fix typo causing syntax error * add fused GEMM+gelu+GEMM modue * fix typo for workspace size * update cublas check for 11600 * add tests for fused dense layer * fix CUDA 10.x path Co-authored-by:Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
-