• Grace's avatar
    Grace/deepseek v3 migration (#12385) · fbd82ba5
    Grace authored
    
    
    * init deepseek model file
    
    * temp removal of flash attention implementation
    
    * shapes and proper, can make a pass
    
    * query, key, value have good cosine similarity, but the max diff is a bit high
    
    * Attention block is working! ** with eager for now, have not added the mask line
    
    * Attention block is working! ** with eager for now, have not added the mask line
    
    * working MoE at around 0.95 cosine sim
    
    * added cosine similarity function
    
    * Starting end to end structure
    
    * Trying (and failing) to get rope to work, going to test full thing on tater
    
    * running on tater36... just not the right outputs
    
    * we have the right values for rope... but its still not working?
    
    * chnage Extrapolation Factor to 1
    
    * removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer
    
    * Temporary modelfiles for cpu
    
    * change kpass intermediate step to kv, two layer outputs [0,1] look fine
    
    * this calls for 16 chicken nuggets
    
    * whoops
    
    * cleaning up code
    
    * delete stuff we dont need
    
    * getting rid of debug statements for llama cpp
    
    * working with long contexts
    
    * fix long context view error
    
    * reverting some changes I made for files that are not apart of pr
    
    * Added proper tokenizer for deeepseek3
    
    * clean up model and go test
    
    * remove Modelfile
    
    * not passing the tests
    
    * whoops
    
    * how to pass the ci tests
    
    * resolving some of the comments
    
    * rename
    
    * linted and renamed deepseek3 -> deepseek2
    
    * remove name go
    
    * addressed changes - main change was adopting qwen3 naming scheme
    
    * I cannot with linters
    
    * clean up logs
    
    * clean up logs
    
    ---------
    Co-authored-by: default avatarGrace Guo <graceguo@Graces-MBP.localdomain>
    Co-authored-by: default avatarGrace Guo <graceguo@Graces-MacBook-Pro.local>
    Co-authored-by: default avatargraceguo <graceguo@tater36.localdomain>
    fbd82ba5
models.go 680 Bytes