• Jonathan Tow's avatar
    [`StableLm`] Add QK normalization and Parallel Residual Support (#29745) · 2f12e408
    Jonathan Tow authored
    * init: add StableLm 2 support
    
    * add integration test for parallel residual and qk layernorm
    
    * update(modeling): match qk norm naming for consistency with phi/persimmon
    
    * fix(tests): run fwd/bwd on random init test model to jitter norm weights off identity
    
    * `use_parallel_residual`: add copy pointer to `GPTNeoXLayer.forward`
    
    * refactor: rename head states var in `StableLmLayerNormPerHead`
    
    * tests: update test model and add generate check
    2f12e408
configuration_stablelm.py 9.23 KB