• Elsa Granger's avatar
    [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading... · b2ad0d9e
    Elsa Granger authored
    
    [pipeline,shardformer] Fix p2p efficiency in pipeline, allow skipping loading weight not in weight_map when `strict=False`, fix llama flash attention forward, add flop estimation by megatron in llama benchmark (#5017)
    
    * Use p2p
    
    * Cannot bidirectonal send p2p
    
    * Refactor tensor creation and serialization in P2P
    communication
    
    * Fix llama forward args in flash attention
    
    * Add flop estimate from megatron
    
    * Support loading weight not in weight_map when strict=False in hybrid_parallel
    
    * Use send_forward_recv_backward, etc in 1f1b
    
    * Use dataclass for metdata
    Remove torch.cuda.synchronize() as suggested
    
    * Add comment about the torch.cuda.synchronize for potential error
    
    * Typo
    
    * Update hybrid_parallel_checkpoint_io.py
    
    * Update p2p.py
    
    * Update one_f_one_b.py
    
    * Update p2p.py
    
    ---------
    Co-authored-by: default avatarflybird11111 <1829166702@qq.com>
    b2ad0d9e
performance_evaluator.py 4.16 KB