• drbh's avatar
    fix cuda graphs for qwen2-vl (#2708) · 01dacf8e
    drbh authored
    
    
    * feat: support multidimensional position ids on batch to enable cuda graphs on qwen2-vl
    
    * fix: only check model type if config exists
    
    * fix: adjust sharding and lm head logic
    
    * fix qwen2 failure in intel cpu
    Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
    
    * fix: return correct shape logits and add streaming test
    
    * fix: remove unused import and refactor test
    
    ---------
    Signed-off-by: default avatarWang, Yi A <yi.a.wang@intel.com>
    01dacf8e
flash_causal_lm.py 93.4 KB