- 29 Sep, 2025 2 commits
-
-
Lei Wang authored
* Update README.md to include directory structure and file descriptions for deepseek_v32 example * Refactor and clean up deepseek_v32 example scripts - Removed unused imports and functions from `fp8_mqa_logits.py` to streamline the code. - Improved formatting and readability in `sparse_mla_fwd_pipelined.py` and `sparse_mla_fwd.py` by adjusting function signatures and indentation. - Added `# ruff: noqa` comments to suppress linting warnings in multiple files. - Enhanced the `generate_random_cu_seqlens` function in `utils.py` for better clarity and organization. - Updated print statements for consistency in output formatting.
-
Lei Wang authored
* [Refactor] Enhance CopyNode Lower method to support disable_tma flag and improve flash attention implementation * Updated the CopyNode Lower method to correctly include the disable_tma flag in the GetCopyInst call. * Refactored the flash attention implementation to selectively disable TMA for specific copy operations while allowing it for others. * Addressed linting issues for improved code quality * sparse mla kernels * Remove deprecated sparse MLA and utility files to streamline the codebase.
-