• Daniël de Kok's avatar
    Add support for scalar FP8 weight scales (#2550) · c29dc89c
    Daniël de Kok authored
    * Add support for scalar FP8 weight scales
    
    * Support LLM compressor FP8 checkpoints on H100
    
    On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype.
    However, we wouldn't pick up fp8 quantization for models quantized with
    LLM compressor. This change adds enough parsing to detect if models have
    FP8-quantized weights.
    
    * Remove stray debug print
    c29dc89c
fp8.py 8.95 KB