"megatron/training/tokenizer/gpt2_tokenization.py" did not exist on "46379244a6081e4a9037342cf43cf78155d6f28b"
Add H20 dtype fp8_w8a8 shared experts fused MoE kernel tuning configs for DeepSeek V3/R1 (#5291)
Co-authored-by:
ximing.wxm <ximing.wxm@antgroup.com>
Showing
Please register or sign in to comment