"src/vscode:/vscode.git/clone" did not exist on "0381151b6a41b521a656eea7af38b872e1679b2c"
  1. 06 Aug, 2024 1 commit
    • Pablo Montalvo's avatar
      Add codestral mamba2 (#32080) · 80b90e7b
      Pablo Montalvo authored
      * add new model like
      
      * draft cuda forward - mismatched keys (sharding on conv1)
      
      * match keys successfully
      
      * fix split
      
      * get generation/forward running (wrong gens, norm?)
      
      * :update
      
      * some refactoring
      
      * fixes
      
      * works up until copy to cache
      
      * fix
      
      * update
      
      * NON WORKING VERSION
      
      * version that work?
      
      * nit
      
      * fix config
      
      * fix conversion script
      
      * working cuda forward
      
      * nit
      
      * update
      
      * simplifcation
      
      * make mamba slow simple work
      
      * no einops
      
      * todo
      
      * fix style
      
      * no einops
      
      * update fix no einsum
      
      * nit
      
      * remove einops
      
      * bug: scan_output differs strongly
      
      * add rms norm option
      
      * fix fast + slow generation with and w/o cache 
      
      
      
      * draft integration tests
      
      * remove a big chunk of the einsum
      
      * fix slow, fast generations, without any einsum
      
      * fix copies
      
      * fix structure
      
      * fix up modeling and tests
      
      * fix tests
      
      * clamping is indeed worse
      
      * recover mamba2 cache test
      
      * fix copies
      
      * no cache position (yet)
      
      * fix tf tests
      
      * fix matmul for generate
      
      * fixup
      
      * skip cache tests for now
      
      * [run-slow]mamba2
      
      * tune out hidden states for padding
      
      * test batched generation
      
      * propagate attention mask changes
      
      * fix past length
      
      * fix integration test
      
      * style
      
      * address comments
      
      * update readme
      
      * add mamba2 version check
      
      * fix tests
      
      * [run-slow]mamba2
      
      * skip edge tests
      
      * [run-slow]mamba2
      
      * last fixup
      
      * [run-slow]mamba2
      
      * update README
      
      ---------
      Co-authored-by: default avatarArthur Zucker <arthur.zucker@gmail.com>
      80b90e7b