1. 13 Nov, 2025 2 commits
    • Lei Wang's avatar
      [Language][Reshape] Improve variable handling and ensure correctness during Layout Reshape (#1248) · d7164abf
      Lei Wang authored
      * fix
      
      * Refactor tensor reshaping in fp8_lighting_indexer.py
      
      - Replaced the allocation of `s_reshaped` with a reshape operation to improve clarity and performance.
      - Updated the logic in the computation of `s_reshaped` to utilize the reshaped tensor, enhancing the overall functionality of the attention mechanism.
      
      * Refactor analyzer usage in Layout and Fragment reshaping
      
      - Consolidated analyzer logic in the `Reshape` methods of `LayoutNode` and `FragmentNode` to utilize a fallback analyzer, improving code clarity and preventing potential null dereference issues.
      - Updated variable binding and simplification calls to use the selected analyzer consistently, enhancing robustness in shape validation and index computation.
      d7164abf
    • Lei Wang's avatar
      [Bugfix] Fix fp8 dtype for some cases (#1246) · 63bf1609
      Lei Wang authored
      * [Enhancement] Add FP8 support and reproducibility in lighting indexer
      
      * Introduced a manual seed in `test_fp8_lighting_indexer` to ensure reproducible performance.
      * Added specializations for `cute::float_e4m3_t` and `cute::float_e5m2_t` in `gemm_mma.h` for enhanced FP8 support across multiple CUDA architectures, ensuring compatibility and improved functionality.ix
      
      * Fix typos in `fp8_lighting_indexer.py` and improve formatting in `gemm_mma.h`
      
      * Corrected a typo in the comment for `test_fp8_lighting_indexer` to enhance clarity.
      * Reformatted lines in `gemm_mma.h` for better readability by aligning template specializations across multiple CUDA architectures.
      
      * test fix
      
      * bug fix
      63bf1609
  2. 20 Oct, 2025 1 commit
  3. 15 Oct, 2025 1 commit
  4. 10 Oct, 2025 1 commit
  5. 29 Sep, 2025 2 commits
    • Lei Wang's avatar
      [Example] Add topk into sparse mla example and append some docs (#901) · 6021ef32
      Lei Wang authored
      * Remove unused `fp8_mqa_logits.py` file and update README.md to reflect new directory structure and file descriptions for deepseek_v32 example. Added sections for architecture overview, Lightning Indexer, Top-k Selector, and Sparse MLA Forward implementations.
      
      * Update linting configurations and improve code formatting in deepseek_v32 example scripts
      
      - Added per-file ignores for the inference directory in `pyproject.toml`.
      - Refactored code in `topk_selector.py`, `convert.py`, `generate.py`, `kernel.py`, and `model.py` to enhance readability by adjusting spacing and line breaks.
      - Ensured consistent formatting across function definitions and assertions for better clarity.
      
      * Refactor test functions in deepseek_v32 example scripts for improved clarity and consistency
      
      - Updated `fp8_lighting_indexer.py` to define a dedicated test function for the lighting indexer.
      - Refactored `sparse_mla_fwd_pipelined.py` and `sparse_mla_fwd.py` to standardize test function parameters and improve readability.
      - Enhanced `topk_selector.py` by introducing a test function with parameters for batch size and sequence length.
      - Ensured all test functions are invoked correctly in the main execution block.
      
      * Enhance test functions in deepseek_v32 example scripts with CUDA requirements and parameterization
      
      - Added CUDA requirements decorators to `test_example_sparse_mla_fwd` and `test_example_sparse_mla_fwd_pipelined`.
      - Parameterized test functions to use specific small shapes for testing, improving test coverage and clarity.
      
      * lint fix
      
      * Update README.md to correct image path for DeepSeek V3.2 architecture diagram
      6021ef32
    • Lei Wang's avatar
      [Example] Add sparse mla examples (#896) · 65ac7454
      Lei Wang authored
      * Update README.md to include directory structure and file descriptions for deepseek_v32 example
      
      * Refactor and clean up deepseek_v32 example scripts
      
      - Removed unused imports and functions from `fp8_mqa_logits.py` to streamline the code.
      - Improved formatting and readability in `sparse_mla_fwd_pipelined.py` and `sparse_mla_fwd.py` by adjusting function signatures and indentation.
      - Added `# ruff: noqa` comments to suppress linting warnings in multiple files.
      - Enhanced the `generate_random_cu_seqlens` function in `utils.py` for better clarity and organization.
      - Updated print statements for consistency in output formatting.
      65ac7454