• Daniël de Kok's avatar
    Add Phi-3 medium support (#2039) · 85dfc392
    Daniël de Kok authored
    Add support for Phi-3-medium
    
    The main difference between the medium and mini models is that medium
    uses grouped query attention with a packed QKV matrix. This change adds
    support for GQA with packed matrixes to `Weights.get_weights_col_packed`
    and uses it for Phi-3. This also allows us to remove the custom
    implementation of GQA from dbrx attention loading.
    85dfc392
weights.py 24.3 KB