1. 16 Dec, 2025 2 commits
  2. 15 Dec, 2025 1 commit
  3. 13 Dec, 2025 2 commits
  4. 12 Dec, 2025 2 commits
  5. 11 Dec, 2025 1 commit
  6. 09 Dec, 2025 2 commits
  7. 08 Dec, 2025 1 commit
    • Michael Yang's avatar
      refactor rope · 603ceefa
      Michael Yang authored
      change to a flatter directory structure and group the options with the
      function
      
      update models to call rope in one place
      603ceefa
  8. 02 Dec, 2025 1 commit
  9. 20 Nov, 2025 1 commit
  10. 19 Nov, 2025 3 commits
  11. 18 Nov, 2025 1 commit
  12. 13 Nov, 2025 1 commit
  13. 06 Nov, 2025 1 commit
  14. 03 Nov, 2025 1 commit
  15. 30 Oct, 2025 2 commits
  16. 29 Oct, 2025 1 commit
  17. 28 Oct, 2025 2 commits
  18. 18 Oct, 2025 1 commit
  19. 13 Oct, 2025 1 commit
  20. 09 Oct, 2025 2 commits
  21. 03 Oct, 2025 1 commit
  22. 24 Sep, 2025 1 commit
    • Grace's avatar
      Grace/deepseek v3 migration (#12385) · fbd82ba5
      Grace authored
      
      
      * init deepseek model file
      
      * temp removal of flash attention implementation
      
      * shapes and proper, can make a pass
      
      * query, key, value have good cosine similarity, but the max diff is a bit high
      
      * Attention block is working! ** with eager for now, have not added the mask line
      
      * Attention block is working! ** with eager for now, have not added the mask line
      
      * working MoE at around 0.95 cosine sim
      
      * added cosine similarity function
      
      * Starting end to end structure
      
      * Trying (and failing) to get rope to work, going to test full thing on tater
      
      * running on tater36... just not the right outputs
      
      * we have the right values for rope... but its still not working?
      
      * chnage Extrapolation Factor to 1
      
      * removed adding residuals twice, removed normalization from shared expert, refactored Norms (Attention, MLP) to be outside the (Attention, MLP) blocks and in the Transformer block instead, add cache setLayer
      
      * Temporary modelfiles for cpu
      
      * change kpass intermediate step to kv, two layer outputs [0,1] look fine
      
      * this calls for 16 chicken nuggets
      
      * whoops
      
      * cleaning up code
      
      * delete stuff we dont need
      
      * getting rid of debug statements for llama cpp
      
      * working with long contexts
      
      * fix long context view error
      
      * reverting some changes I made for files that are not apart of pr
      
      * Added proper tokenizer for deeepseek3
      
      * clean up model and go test
      
      * remove Modelfile
      
      * not passing the tests
      
      * whoops
      
      * how to pass the ci tests
      
      * resolving some of the comments
      
      * rename
      
      * linted and renamed deepseek3 -> deepseek2
      
      * remove name go
      
      * addressed changes - main change was adopting qwen3 naming scheme
      
      * I cannot with linters
      
      * clean up logs
      
      * clean up logs
      
      ---------
      Co-authored-by: default avatarGrace Guo <graceguo@Graces-MBP.localdomain>
      Co-authored-by: default avatarGrace Guo <graceguo@Graces-MacBook-Pro.local>
      Co-authored-by: default avatargraceguo <graceguo@tater36.localdomain>
      fbd82ba5
  23. 23 Sep, 2025 2 commits
  24. 19 Sep, 2025 1 commit
  25. 18 Sep, 2025 1 commit
  26. 17 Sep, 2025 1 commit
  27. 16 Sep, 2025 2 commits
  28. 15 Sep, 2025 2 commits