• Lei Wang's avatar
    [Layout] Introduce a new layout inference mechanism (#699) · 407117e1
    Lei Wang authored
    
    
    * Implement new free stage layout inference.
    
    * Fix bug
    
    * Make replication upcasting and unnormalizable iterators safe.
    
    * Better handling of updating with more replica
    
    * Remove unnecessary check.
    
    * Fix compilation.
    
    * Fix setup.py.
    
    * Simplify development mode.
    
    * Allow ParallelOp layout when there's already a compatible layout specified
    
    * lint fix
    
    * Add ProveFragmentContains function to validate thread access between small and large fragments
    
    This function checks if the threads accessing elements of a smaller fragment are a subset of those accessing a larger fragment, ensuring valid access during updates. The implementation includes deriving thread indices, computing logical indices, and verifying thread mappings.
    
    * Update dependencies in requirements files
    
    * Remove 'thefuzz' from requirements-dev.txt
    * Specify exact versions for 'torch' and add 'flash_attn' in requirements-test.txt
    
    * Update CI workflow to use SHA256 hash for requirements file
    
    * Update requirements and CI workflow for flash attention
    
    * Removed specific version for 'torch' in requirements-test.txt
    * Added installation of 'flash_attn==2.5.8' in CI workflow to ensure compatibility
    
    * Refactor flash attention import handling in examples
    
    * Removed availability checks for 'flash_attn' in multiple example scripts.
    * Simplified import statements for 'flash_attn' to ensure consistent usage across examples.
    
    ---------
    Co-authored-by: default avatarHuanqi Cao <caohuanqi@deepseek.com>
    407117e1
parallel.cc 15.4 KB