Commits · bdd481d15da054bceecd1ea61fe9c45e148f71b6 · OpenDAS / apex

21 May, 2020 1 commit
- pass all TensorListMetadata as pointer to pinned host memory (#13) · bdd481d1
  Jeff Daily authored May 21, 2020
  
  bdd481d1
12 May, 2020 1 commit
- enable multi tensor extension for bfloat16 · 69251362
  rohithkrn authored May 11, 2020
  
  69251362
07 May, 2020 1 commit

[Upstream] IFU 05072020 (#4) · e85a1d4b

Chaitanya Sri Krishna Lolla authored May 07, 2020



* fix dropout scaling from p to 1/(1-p) (#816)
Co-authored-by: Sukru Eryilmaz <seryilmaz@computelab-dgx1v-32.nvidia.com>

* Improvements to apex.mlp (#804)

* update fused bias relu backward kernel

* adding support for not require first layer dgrad

* fix bug: wrong layer in requires grad

* add infrastructure for optional bias and activation, currently only support no bias and no relu

* make bias and relu optional separately

* add sigmoid activation option

* enable wider load/store for multi_tensor_apply kernels (#763)

* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load

* Changes to make xentropysoftmax load/store vectorized when possible: (#725)

* Changes to make xentropysoftmax load/store vectorized when possible:
Increase default ILP so that each thread handle 16 Bytes data in one step
Make thread load/store longest vector possible
Make unroll case handle adjacent data instead of strided...

e85a1d4b

30 Apr, 2020 1 commit
- enable wider load/store for multi_tensor_apply kernels (#763) · 17ee854e
  Deyu Fu authored Apr 30, 2020
```
* modify MTA axpby for wider load/store

* Make scale/axpby/l2/adam/lamb multi_tensor uses wider load
```
  17ee854e
04 Apr, 2019 1 commit

WIP: Handle arbitrary combinations of optimizers/models/losses (#232) · 3f87614f

mcarilli authored Apr 03, 2019

* Refactor to allow more flexible treatment of multiple optimizers/models/losses

* Adding _process_optimizers.py

* Created L0 tests (now passing).

* fix: minor print typo (#234)

* make L1 results easier to read

* L0 multiple model/optimizer/loss test fleshed out

* Adding test that master params remain synced across distributed processes

* Docstring updates

* Docstring updates

3f87614f

19 Mar, 2019 1 commit
- Multi-tensor axpby kernel for more flexible unscaling (groundwork for #163 and #179 fix) · 5e552004
  Michael Carilli authored Mar 18, 2019
  
  5e552004