• Hongxin Liu's avatar
    [shardformer] update colo attention to support custom mask (#5510) · 19e1a5cf
    Hongxin Liu authored
    * [feature] refactor colo attention (#5462)
    
    * [extension] update api
    
    * [feature] add colo attention
    
    * [feature] update sdpa
    
    * [feature] update npu attention
    
    * [feature] update flash-attn
    
    * [test] add flash attn test
    
    * [test] update flash attn test
    
    * [shardformer] update modeling to fit colo attention (#5465)
    
    * [misc] refactor folder structure
    
    * [shardformer] update llama flash-attn
    
    * [shardformer] fix llama policy
    
    * [devops] update tensornvme install
    
    * [test] update llama test
    
    * [shardformer] update colo attn kernel dispatch
    
    * [shardformer] update blip2
    
    * [shardformer] update chatglm
    
    * [shardformer] update gpt2
    
    * [shardformer] update gptj
    
    * [shardformer] update opt
    
    * [shardformer] update vit
    
    * [shardformer] update colo attention mask prep
    
    * [shardformer] update whisper
    
    * [test] fix shardformer tests (#5514)
    
    * [test] fix shardformer tests
    
    * [test] fix shardformer tests
    19e1a5cf
flash_attention_npu.py 1.88 KB