1. 30 May, 2020 2 commits
  2. 29 May, 2020 1 commit
  3. 14 May, 2020 1 commit
  4. 23 Apr, 2020 1 commit
  5. 22 Apr, 2020 1 commit
  6. 23 Mar, 2020 1 commit
  7. 20 Mar, 2020 2 commits
  8. 11 Mar, 2020 1 commit
  9. 02 Mar, 2020 1 commit
  10. 27 Feb, 2020 1 commit
  11. 25 Feb, 2020 2 commits
  12. 24 Feb, 2020 1 commit
    • Kevin Stephano's avatar
      Change to Multihead Attention to allow Batched GEMMs larger than 64K. (#728) · 1733946a
      Kevin Stephano authored
      * Adding C++ Multihead Attention implementation to contrib.
      
      * Add reference test that at least works for forward.
      
      * Remove CublasLt support from multihead attention.
      
      * Add new Python version of self attention.
      
      * Update python model of MHA with backward pass.
      
      * Fixed Output Linear connection in MHA.
      
      * Clean up compiles and add documentation to PySelfAttention.
      
      * Add Encdec Python version of multihead attention.  Cleanup files.
      
      * Tests for self and encdec multihead attention.
      
      * Add reference pytorch implementation of attention with norm and add.
      
      * Add cutlass branch definition.
      
      * Add cutlass download to compile.
      
      * Add norm/add tests.
      
      * Add biases to pytorch python versions.
      
      * Add tests and fix issues with python version of attention masking.
      
      * Create README.md
      
      * Update README.md
      
      * Update README.md
      
      * Update perf test parameters.
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Add files via upload
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Fix matmul1 output tensor size.  Fix tests that missed issue.
      
      * Allow for Z dimensions of 64K and greater on batched GEMMs.
      
      * remove redundant imports
      
      * general cleanup, remove deprecated or unused functions
      1733946a
  13. 15 Feb, 2020 1 commit
  14. 06 Feb, 2020 1 commit
    • Kevin Stephano's avatar
      Add Fast Multihead Attention to APEX Contrib (#697) · 3f94528e
      Kevin Stephano authored
      * Adding C++ Multihead Attention implementation to contrib.
      
      * Add reference test that at least works for forward.
      
      * Remove CublasLt support from multihead attention.
      
      * Add new Python version of self attention.
      
      * Update python model of MHA with backward pass.
      
      * Fixed Output Linear connection in MHA.
      
      * Clean up compiles and add documentation to PySelfAttention.
      
      * Add Encdec Python version of multihead attention.  Cleanup files.
      
      * Tests for self and encdec multihead attention.
      
      * Add reference pytorch implementation of attention with norm and add.
      
      * Add cutlass branch definition.
      
      * Add cutlass download to compile.
      
      * Add norm/add tests.
      
      * Add biases to pytorch python versions.
      
      * Add tests and fix issues with python version of attention masking.
      
      * Create README.md
      
      * Update README.md
      
      * Update README.md
      
      * Update perf test parameters.
      
      * Update README.md
      
      * Update README.md
      
      * Update README.md
      
      * Add f...
      3f94528e
  15. 21 Jan, 2020 1 commit
  16. 08 Jan, 2020 1 commit
  17. 04 Oct, 2019 1 commit
  18. 13 Sep, 2019 1 commit
  19. 06 Sep, 2019 1 commit
    • mcarilli's avatar
      Fix for #456 (#477) · 325f5a0b
      mcarilli authored
      * Pushing for build tests
      
      * Contrib files
      
      * Removing deprecated checks
      325f5a0b
  20. 17 Aug, 2019 1 commit
  21. 16 Aug, 2019 1 commit
  22. 13 Aug, 2019 1 commit
  23. 08 Aug, 2019 1 commit
  24. 31 May, 2019 1 commit
    • Thor Johnsen's avatar
      Multi tensor lamb optimizer (#334) · 8be5b6be
      Thor Johnsen authored
      * First draft, for discussion
      
      * Fix mistakes in LAMB equations
      
      * Add loop over chunk
      
      * Bug fix
      
      * Bug fix
      
      * Bug fix
      
      * Undo bug fix
      
      * Bug fix
      
      * Add multi tensor LAMB optimizer to setup.py
      
      * Rename step_size to learning_rate
      
      * Fix compilation errors
      8be5b6be
  25. 23 May, 2019 1 commit
  26. 22 May, 2019 1 commit
  27. 09 May, 2019 1 commit
    • Wil Kong's avatar
      Add softmax cross entropy loss with label smoothing support. (#295) · 0c74571f
      Wil Kong authored
      * Add softmax cross entropy loss with label smoothing support.
      
      * Fix deprecation of AT_DISPATCH_XXX and several minor issues.
      
      * Fix issues commented by reviewers.
      
      * Add FB license.
      
      * Remove code generation constraints.
      
      * Add a simple unittest for label smoothing.
      0c74571f
  28. 27 Apr, 2019 1 commit
    • jjsjann123's avatar
      Bnp integration pr (#275) · fedfe0d7
      jjsjann123 authored
      * Persistent group batchnorm added
      
      Added persistent grouped batch norm for performance run on strong scaling case:
      currently only supporting:
      
        1. nhwc layout
        2. fp16
        3. synchronization only within a node!
      
      Environment variable is used to tune LAUNCH_MARGIN that limits the CTAs usage
      by the persistent kernel.
      
      Documentation and examples will follow.
      
      * updating type().scalarType() to scalar_type()
      
      * moving launch margin to be defined at layer creation, adding a knob cap max ctas per sm
      
      * fixing the cta computation
      
      * review comment:
      
      set device_id through cudaGetDevice()
      move cudaMemset to cudaMemsetAsync
      updated __threadfence() to __threadfence_system() inter device write
      fedfe0d7
  29. 18 Apr, 2019 1 commit
  30. 09 Apr, 2019 1 commit
  31. 23 Mar, 2019 1 commit
  32. 22 Mar, 2019 1 commit
    • mcarilli's avatar
      Check cuda version (#216) · 5b8faa29
      mcarilli authored
      * Adding Torch + bare-metal nvcc version check and container build tests
      
      * Putting a canary in the coalmine
      
      * canary proved elusive
      
      * Trying direct setup.py install
      
      * this should work
      
      * Removing canary
      
      * hopefully this works
      5b8faa29
  33. 19 Mar, 2019 1 commit
  34. 13 Mar, 2019 1 commit
  35. 12 Mar, 2019 1 commit
  36. 10 Mar, 2019 1 commit
  37. 08 Mar, 2019 1 commit