Commits · 03204b8451ba962a0844621c22fd3748d65dfc11 · OpenDAS / apex

11 Aug, 2023 1 commit
- Changes to support hipblas migration (#113) · 8fc9b21f
  Pruthvi Madugundu authored Aug 11, 2023
  
  8fc9b21f
12 Jun, 2023 1 commit

flyingdown authored Jun 06, 2023

2.添加环境变量APEX_ROCBLAS_GEMM_ALLOW_HALF用于控制是否使用fp16r
3.添加dcu版本信息

whl包名修改

readme更新安装步骤

f8b650c8

14 Nov, 2022 1 commit
- modify rocblas_gemm_ex's compute_type to rocblas_datatype_f16_r for fp16 · db7007ae
  flyingdown authored Nov 14, 2022
  
  db7007ae
31 May, 2022 1 commit

Make rocblas_gemm_flags_fp16_alt_impl backward-compat for new naming (#79) · cf77e9b5

Hubert Lu authored May 31, 2022

* Make rocblas_gemm_flags_fp16_alt_impl backward-compat for new naming

* Use BACKWARD_PASS_GUARD_CLASS to prevent lengthy if-statement

cf77e9b5

06 Apr, 2022 1 commit

Make rocblas_gemm_flags_fp16_alt_impl in MHA and MLP backward compatible with... · 5ecad142

Hubert Lu authored Apr 06, 2022

Make rocblas_gemm_flags_fp16_alt_impl in MHA and MLP backward compatible with old PyTorch versions (#74)

* First attempt to make rocblas flag backward compatible

* Fix some bugs

* Fix some bugs

* Make rocblas_gemm_flags_fp16_alt_impl in MHA backward compatible with old PyTorch versions

* Add groupbn extension unit tests for ROCm

* Fix some bugs

5ecad142

23 Mar, 2022 1 commit

Add rocblas_alt_impl flag for backprop in MLP (#71) · 063d720f

Hubert Lu authored Mar 23, 2022

* Add rocblas_alt_impl flag in MLP

* Refactor rocblas_alt_impl implementation and only use it for backprop

063d720f

17 May, 2021 1 commit
- compile cublasLt code only for cublas >= 11.0 (#1108) · 00c1e56d
  Burc Eryilmaz authored May 17, 2021
```
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>
```
  00c1e56d
19 Apr, 2021 1 commit
- Fix cublasLt context create/destroy overhead in MLP extension (#1083) · 082f999a
  Burc Eryilmaz authored Apr 19, 2021
```
* don't create cublasLt handle, fix zero block size case

* cleanup
```
  082f999a
17 Apr, 2021 1 commit

initial cublaslt support for MLP (#1080) · b8be1bc7

Burc Eryilmaz authored Apr 16, 2021



* initial cublaslt support

* 64 bit input

* add license headers

* cleanup

* remove license
Co-authored-by: pbialecki <pbialecki@nvidia.com>

b8be1bc7

05 Aug, 2020 1 commit

Enable mlp_cuda extension. (#28) · d2f6d04a

Chaitanya Sri Krishna Lolla authored Aug 05, 2020

* enable mlp cuda

* add setup changes and tests

* skip the unit tests

* updated conditions for empty array

* removed hip platform conditions

d2f6d04a

07 May, 2020 1 commit

[Upstream] IFU 05072020 (#4) · e85a1d4b

Chaitanya Sri Krishna Lolla authored May 07, 2020



* fix dropout scaling from p to 1/(1-p) (#816)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>

* Improvements to apex.mlp (#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided...

e85a1d4b

30 Apr, 2020 1 commit

Improvements to apex.mlp (#804) · 31aceeaa

Deyu Fu authored Apr 30, 2020

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

31aceeaa

22 Apr, 2020 1 commit
- initial commit to add Multilayer Perceptron (MLP) extension (#790) · 71511faf
  Deyu Fu authored Apr 22, 2020
  
  71511faf