• Aritra Roy Gosthipaty's avatar
    [Port] TensorFlow implementation of Mistral (#29708) · 965e98dc
    Aritra Roy Gosthipaty authored
    
    
    * chore: initial commit
    
    * chore: adding imports and inits
    
    * chore: adding the causal and classification code
    
    * chore: adding names to the layers
    
    * chore: using single self attn layer
    
    * chore: built the model and layers
    
    * chore: start with testing
    
    * chore: docstring change, transpose fix
    
    * fix: rotary embedding
    
    * chore: adding cache implementation
    
    * remove unused torch
    
    * chore: fixing the indexing issue
    
    * make fix-copies
    
    * Use modeling_tf_utils.keras
    
    * make fixup
    
    * chore: fixing tests
    
    * chore: adding past key value logic
    
    * chore: adding multi label classfication test
    
    * fix: switching on the built parameters in the layers
    
    * fixing repo consistency
    
    * ruff formats
    
    * style changes
    
    * fix: tf and pt equivalence
    
    * removing returns from docstrings
    
    * fix docstrings
    
    * fix docstrings
    
    * removing todos
    
    * fix copies
    
    * fix docstring
    
    * fix docstring
    
    * chore: using easier rotate_half
    
    * adding integration tests
    
    * chore: addressing review related to rotary embedding layer
    
    * review changes
    
    * [run-slow] mistral
    
    * skip: test save load after resize token embedding
    
    * style
    
    ---------
    Co-authored-by: default avatarMatt <rocketknight1@gmail.com>
    965e98dc
index.md 40.9 KB