docs/examples/attention/attention.ipynb · 373394789c5861d40c9ec51a2c2c2cca22b22ba0 · OpenDAS / TransformerEngine

Refactoring attention.py part 1 (#1542) · 37339478
Kshitij Lakhani authored Mar 14, 2025


* Create pytorch/dot_product_attention module and pytorch/d_p_a/utils.py
Move attention logging into a separate class in pytorch/d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Create FlashAttentionUtils class in pytorch/d_p_a/utils/py for versioning info
Move versioning info out of pytorch/attention.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Move AttentionParams and get_attention_backend from attention.py to d_p_a/utils.py
Fix tests and imports for the above refactor change
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Move get_qkv_layout(), get_full_mask(), get_alibi(), get_attention_quantizers() to d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Move tensor packing and unpacking helper functions from pyt/attention.py to d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Move cumulative seqlens and indices methods from pyt/attention.py to d_p_a/utils.py
Rename cumulative functions from using _cu_ to using _cumul_ to differentiate from CUDA cu calls protocol
Rename tensor packaging methods with leading underscore to make them as internal to file
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Remove unnecessary imports in pytorch/attention.py and d_p_a/utils.py
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* Create d_p_a/inference.py and move InferenceParams from pyt/attention.py to it
Modify tests and other files to import InferenceParams correctly
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

Modify docs api for InferenceParams
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Create d_p_a/rope.py and move RoPE methods from  pytorch/attention.py to it
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Code cleanup
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix qa testing induced bug
Code clean up
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Fix incorrect pack_tensor arg type
Code clean up
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* nit: Resolve lint errors
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* Remove typedef FAUtils for FlashAttentionUtils
Use attn_log instead of att_log
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

Fix lint error
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci



* nit: Fix the function name from get_cumul to the earlier get_cu
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

* nit: Fix typos, explicit imports and remove extra comments
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>

---------
Signed-off-by: Kshitij Janardan Lakhani <klakhani@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
37339478
attention.ipynb 38 KB
Replace attention.ipynb