You need to sign in or sign up before continuing.
- 17 Sep, 2025 1 commit
-
-
Sudhakar Singh authored
* add tutorial files and other local changes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove extraneous code for easy debu Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * make cuda graphs work with non-paged and paged attention Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * perf imp for kv cache ops Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add code for calibration Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * optimize kv_cache reindex and copy kernels Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * changes to make quantizers work with fp8_calibration Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * avoid reindexing from python side Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * rename variable from previous commit Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * minor fix Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> * use quantizer only if needed Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * functionality of the tutorial tested and perf checked Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove files and update headers/licenses Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * update header/license Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tutorial for review Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make weights downloadable on the fly; remove extra print statements Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix lint and update comments Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comma back, typo Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * sequence_start_positions should be None for training Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add paged attention numberes and update requirements.txt file Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * more fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * make tutorial work on blackwell Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * remove gemma FT tutorial for now Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fixing the headings placement and rewording attention -> kv caching Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fixes from comments Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix the images Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * misc fixes Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add more comments to te_gemma.py and cleanup utils.py Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add more information about the hierarchy of the classes used in the tutorial Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add better cuda graphs picture Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * addd updated cuda graphs pictures Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add illustrated cuda graphs Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * small fixes in documentation Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add torch.no_grad() to force reduced memory usage Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * some fixes from recent comments Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * more fixes from remaining comments Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * add te_rope_emb to class desc Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> * fix tutorial wording; add calibration fix to grouped_linear.py Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> --------- Signed-off-by:
Sudhakar Singh <sudhakars@nvidia.com> Signed-off-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com> Co-authored-by:
pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by:
Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
-