• Anton Vlasjuk's avatar
    Mamba `slow_forward` gradient fix (#29563) · cefb819f
    Anton Vlasjuk authored
    * FIX: Cached slow forward in mamba
    - additionally added mamba cached test
    - added unused test (mamba causal lm forward and backward)
    - fixed typo: "causl" --> "causal"
    
    * formatting
    
    * fix: use real `slow_forward` call instead of torch module's
    
    * add shape assertion for mixer block test
    
    * adjust shape assertion
    cefb819f
test_modeling_mamba.py 21.7 KB