1. 23 Mar, 2022 1 commit
  2. 27 Feb, 2022 1 commit
  3. 26 Feb, 2022 1 commit
  4. 10 Feb, 2022 1 commit
  5. 01 Feb, 2022 1 commit
    • ChongyuNVIDIA's avatar
      Add the permutation related support as the extension for asp lib. (#1194) · 89edb819
      ChongyuNVIDIA authored
      * Add the permutation related support as the extension for asp lib.
      
      * [Fix] Track the permutation sequence for progressive channel swap strategy.
      
      * Fix the corner case that one layer is not sparse, but need to apply permutation due to its siblings.
      
      * Fix the deprecated functions in ASP unit tests.
      
      * Fix the sparsity info typo in ASP lib.
      
      * [Enhancement] Set the identical random seed for all GPUs to make sure the same results generated in permutation search.
      
      * Update the README.md with identical random seed setting and NeurIPS info.
      
      * Integrate the Pybind11 enhancement of permutation search into ASP lib.
      89edb819
  6. 19 Jan, 2022 1 commit
  7. 13 Jan, 2022 1 commit
  8. 16 Dec, 2021 1 commit
  9. 15 Dec, 2021 1 commit
  10. 14 Dec, 2021 1 commit
    • Masaki Kozuki's avatar
      Faster `--fast_multihead_attn` build (#1245) · 7ec8ed67
      Masaki Kozuki authored
      * merge .so files
      
      * odr
      
      * fix build
      
      * update import
      
      * apply psf/black with max line length of 120
      
      * update
      
      * fix
      
      * update
      
      * build fixed again but undefined symbol again
      
      * fix 2, still layer norm grad is undefined
      
      * remove unused cpp files
      
      * without layer_norm.cuh, import works
      
      * import fast_multihead_attn works...
      
      but why? Was unnecessary `#include "layer_norm.cuh"` was the culprit
      causing .shared objects not to be able to link `HostApplyLayerNorm` and
      `HostLayerNormGradient`?
      
      * clean up layer norm
      7ec8ed67
  11. 09 Dec, 2021 1 commit
  12. 27 Oct, 2021 1 commit
  13. 02 Oct, 2021 1 commit
  14. 08 Sep, 2021 1 commit
    • Masaki Kozuki's avatar
      enable ninja (#1164) · 9ce0a10f
      Masaki Kozuki authored
      - passing include directories to `CUDAExtension`'s `include_dirs` argument
      - removing `-I/path/to/dir` arguments from `extra_compile_args`
      9ce0a10f
  15. 01 Sep, 2021 2 commits
  16. 17 Jul, 2021 2 commits
    • Nan Zheng's avatar
      Added more fusion and vectorized kernel for transducer (#1125) · 0c2c6eea
      Nan Zheng authored
      * Added support for fused ReLU and dropout into transducer joint
      
      * Reorganized code selection path in transducer joint fwd
      * Added support for fused ReLU+dropout into transducer joint
      
      * Vectorize transducer loss backward with fused softmax (#3)
      
      * Nanz/transducer loss (#4)
      
      * Vectorize transducer loss backward with fused softmax
      
      * Added a predicate to avoid potential IMA
      
      * Nanz/transducer loss (#5)
      
      * Vectorize transducer loss backward with fused softmax
      
      * Added a predicate to avoid potentional IMA
      
      * Added more predicates to avoid IMAs
      
      * Updated documentations for newly added features.
      
      * Fixed a error in transducer.py
      0c2c6eea
    • yjk21's avatar
      Adds small-batch kernels (#1126) · ed719967
      yjk21 authored
      ed719967
  17. 17 Apr, 2021 1 commit
  18. 16 Apr, 2021 1 commit
  19. 24 Mar, 2021 1 commit
    • Nan Zheng's avatar
      Initial check-in of the transducer extensions (#1069) · d86d1b09
      Nan Zheng authored
      * Initial check-in of the transducer extension.
      
      * Added more comments to help explain the code
      
      * Corrected minor typos
      
      * 1. Renamed variable in tests to match the extension
      2. Disabled ninja build option
      d86d1b09
  20. 23 Feb, 2021 1 commit
  21. 01 Dec, 2020 1 commit
  22. 10 Aug, 2020 1 commit
  23. 01 Aug, 2020 1 commit
  24. 01 Jun, 2020 1 commit
  25. 30 May, 2020 2 commits
  26. 29 May, 2020 1 commit
  27. 14 May, 2020 1 commit
  28. 23 Apr, 2020 1 commit
  29. 22 Apr, 2020 1 commit
  30. 23 Mar, 2020 1 commit
  31. 20 Mar, 2020 2 commits
  32. 11 Mar, 2020 1 commit
  33. 02 Mar, 2020 1 commit
  34. 27 Feb, 2020 1 commit
  35. 25 Feb, 2020 2 commits