1. 27 May, 2019 1 commit
  2. 23 May, 2019 3 commits
  3. 22 May, 2019 3 commits
  4. 21 May, 2019 1 commit
  5. 17 May, 2019 2 commits
    • jjsjann123's avatar
      [syncbn update] (#287) · a5289067
      jjsjann123 authored
      update input size check to fix github issue #262
      
      update SyncBatchNorm count check so that size 1 input with cross GPU
      synchronization runs fine.
      a5289067
    • jjsjann123's avatar
      [SyncBatchNorm update] (#285) · ffbb52ba
      jjsjann123 authored
      resolves issue #254
      
      Added input casting for pure python implementation, this supports mismatched
      input and layer dtype.
      ffbb52ba
  6. 16 May, 2019 1 commit
  7. 15 May, 2019 3 commits
  8. 13 May, 2019 3 commits
  9. 10 May, 2019 1 commit
  10. 09 May, 2019 2 commits
  11. 08 May, 2019 1 commit
  12. 03 May, 2019 1 commit
  13. 02 May, 2019 2 commits
  14. 01 May, 2019 3 commits
  15. 30 Apr, 2019 5 commits
  16. 29 Apr, 2019 2 commits
  17. 27 Apr, 2019 2 commits
    • jjsjann123's avatar
      Bnp integration pr (#275) · fedfe0d7
      jjsjann123 authored
      * Persistent group batchnorm added
      
      Added persistent grouped batch norm for performance run on strong scaling case:
      currently only supporting:
      
        1. nhwc layout
        2. fp16
        3. synchronization only within a node!
      
      Environment variable is used to tune LAUNCH_MARGIN that limits the CTAs usage
      by the persistent kernel.
      
      Documentation and examples will follow.
      
      * updating type().scalarType() to scalar_type()
      
      * moving launch margin to be defined at layer creation, adding a knob cap max ctas per sm
      
      * fixing the cta computation
      
      * review comment:
      
      set device_id through cudaGetDevice()
      move cudaMemset to cudaMemsetAsync
      updated __threadfence() to __threadfence_system() inter device write
      fedfe0d7
    • Michael Carilli's avatar
      syntax · e7beba17
      Michael Carilli authored
      e7beba17
  18. 26 Apr, 2019 4 commits