gradient accumulation fusion
remove redundant linear layer class definition add fuse_gradient_accumulation attribute to weights for simple targetting reflect feedback and clean up the codes arg change
Showing
Please register or sign in to comment