Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • T TransformerEngine
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • OpenDAS
  • TransformerEngine
  • Repository

Switch branch/tag
  • TransformerEngine
  • docs
  • examples
  • te_gemma
  • media
  • generation_animation.gif
Find file HistoryPermalink
  • Sudhakar Singh's avatar
    TE Gemma tutorial attempt#2 (#1839) · 7042d7ae
    Sudhakar Singh authored Sep 16, 2025
    
    
    * add tutorial files and other local changes
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * remove extraneous code for easy debu
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * make cuda graphs work with non-paged and paged attention
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * perf imp for kv cache ops
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * add code for calibration
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * optimize kv_cache reindex and copy kernels
    Signed-off-by: default avatarCharlene Yang <8636796+cyanguwa@users.noreply.github.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * changes to make quantizers work with fp8_calibration
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * avoid reindexing from python side
    Signed-off-by: default avatarCharlene Yang <8636796+cyanguwa@users.noreply.github.com>
    
    * rename variable from previous commit
    Signed-off-by: default avatarCharlene Yang <8636796+cyanguwa@users.noreply.github.com>
    
    * minor fix
    Signed-off-by: default avatarCharlene Yang <8636796+cyanguwa@users.noreply.github.com>
    
    * minor fix
    Signed-off-by: default avatarCharlene Yang <8636796+cyanguwa@users.noreply.github.com>
    
    * use quantizer only if needed
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * functionality of the tutorial tested and perf checked
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * remove files and update headers/licenses
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * update header/license
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * update tutorial for review
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * make weights downloadable on the fly; remove extra print statements
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * fix lint and update comments
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * add comma back, typo
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * sequence_start_positions should be None for training
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * add paged attention numberes and update requirements.txt file
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * more fixes
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * make tutorial work on blackwell
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * remove gemma FT tutorial for now
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * fixing the headings placement and rewording attention -> kv caching
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * fixes from comments
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * fix the images
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * misc fixes
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * add more comments to te_gemma.py and cleanup utils.py
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * add more information about the hierarchy of the classes used in the tutorial
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * [pre-commit.ci] auto fixes from pre-commit.com hooks
    
    for more information, see https://pre-commit.ci
    
    
    
    * add better cuda graphs picture
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * addd updated cuda graphs pictures
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * add illustrated cuda graphs
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * fix
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * small fixes in documentation
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * add torch.no_grad() to force reduced memory usage
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * some fixes from recent comments
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * more fixes from remaining comments
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * add te_rope_emb to class desc
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    * fix tutorial wording; add calibration fix to grouped_linear.py
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    
    ---------
    Signed-off-by: default avatarSudhakar Singh <sudhakars@nvidia.com>
    Signed-off-by: default avatarCharlene Yang <8636796+cyanguwa@users.noreply.github.com>
    Co-authored-by: default avatarpre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
    Co-authored-by: default avatarCharlene Yang <8636796+cyanguwa@users.noreply.github.com>
    7042d7ae
generation_animation.gif 132 KB

Download (132 KB)

generation_animation.gif

Replace generation_animation.gif

Attach a file by drag & drop or click to upload


Cancel
A new branch will be created in your fork and a new merge request will be started.