1. 16 Aug, 2019 1 commit
  2. 08 Aug, 2019 1 commit
  3. 06 Aug, 2019 1 commit
    • ngimel's avatar
      Clean up layer norm tests (#418) · 3ef01fae
      ngimel authored
      * Bug fix for non-affine layer-norm + add backward unit test
      
      * clean up tests and add tests for a large batch
      3ef01fae
  4. 01 Aug, 2019 1 commit
  5. 26 Jul, 2019 1 commit
  6. 12 Jul, 2019 1 commit
  7. 03 Jul, 2019 4 commits
  8. 28 Jun, 2019 1 commit
  9. 14 Jun, 2019 1 commit
  10. 11 Jun, 2019 1 commit
  11. 31 May, 2019 2 commits
  12. 27 May, 2019 1 commit
  13. 10 May, 2019 1 commit
  14. 03 May, 2019 1 commit
  15. 27 Apr, 2019 1 commit
    • jjsjann123's avatar
      Bnp integration pr (#275) · fedfe0d7
      jjsjann123 authored
      * Persistent group batchnorm added
      
      Added persistent grouped batch norm for performance run on strong scaling case:
      currently only supporting:
      
        1. nhwc layout
        2. fp16
        3. synchronization only within a node!
      
      Environment variable is used to tune LAUNCH_MARGIN that limits the CTAs usage
      by the persistent kernel.
      
      Documentation and examples will follow.
      
      * updating type().scalarType() to scalar_type()
      
      * moving launch margin to be defined at layer creation, adding a knob cap max ctas per sm
      
      * fixing the cta computation
      
      * review comment:
      
      set device_id through cudaGetDevice()
      move cudaMemset to cudaMemsetAsync
      updated __threadfence() to __threadfence_system() inter device write
      fedfe0d7
  16. 26 Apr, 2019 5 commits
  17. 25 Apr, 2019 1 commit
  18. 22 Apr, 2019 1 commit
  19. 18 Apr, 2019 1 commit
  20. 12 Apr, 2019 1 commit
  21. 10 Apr, 2019 2 commits
  22. 09 Apr, 2019 1 commit
  23. 08 Apr, 2019 1 commit
  24. 04 Apr, 2019 1 commit
    • mcarilli's avatar
      WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f
      mcarilli authored
      * Refactor to allow more flexible treatment of multiple optimizers/models/losses
      
      * Adding _process_optimizers.py
      
      * Created L0 tests (now passing).
      
      * fix: minor print typo (#234)
      
      * make L1 results easier to read
      
      * L0 multiple model/optimizer/loss test fleshed out
      
      * Adding test that master params remain synced across distributed processes
      
      * Docstring updates
      
      * Docstring updates
      3f87614f
  25. 21 Mar, 2019 2 commits
  26. 19 Mar, 2019 2 commits
  27. 15 Mar, 2019 1 commit
  28. 12 Mar, 2019 1 commit
  29. 11 Mar, 2019 1 commit