• Pablo Montalvo's avatar
    Add codestral mamba2 (#32080) · 80b90e7b
    Pablo Montalvo authored
    * add new model like
    
    * draft cuda forward - mismatched keys (sharding on conv1)
    
    * match keys successfully
    
    * fix split
    
    * get generation/forward running (wrong gens, norm?)
    
    * :update
    
    * some refactoring
    
    * fixes
    
    * works up until copy to cache
    
    * fix
    
    * update
    
    * NON WORKING VERSION
    
    * version that work?
    
    * nit
    
    * fix config
    
    * fix conversion script
    
    * working cuda forward
    
    * nit
    
    * update
    
    * simplifcation
    
    * make mamba slow simple work
    
    * no einops
    
    * todo
    
    * fix style
    
    * no einops
    
    * update fix no einsum
    
    * nit
    
    * remove einops
    
    * bug: scan_output differs strongly
    
    * add rms norm option
    
    * fix fast + slow generation with and w/o cache 
    
    
    
    * draft integration tests
    
    * remove a big chunk of the einsum
    
    * fix slow, fast generations, without any einsum
    
    * fix copies
    
    * fix structure
    
    * fix up modeling and tests
    
    * fix tests
    
    * clamping is indeed worse
    
    * recover mamba2 cache test
    
    * fix copies
    
    * no cache position (yet)
    
    * fix tf tests
    
    * fix matmul for generate
    
    * fixup
    
    * skip cache tests for now
    
    * [run-slow]mamba2
    
    * tune out hidden states for padding
    
    * test batched generation
    
    * propagate attention mask changes
    
    * fix past length
    
    * fix integration test
    
    * style
    
    * address comments
    
    * update readme
    
    * add mamba2 version check
    
    * fix tests
    
    * [run-slow]mamba2
    
    * skip edge tests
    
    * [run-slow]mamba2
    
    * last fixup
    
    * [run-slow]mamba2
    
    * update README
    
    ---------
    Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
    80b90e7b
__init__.py 0 Bytes