- To enable fp8 quantization, you can add `--quantization fp8` on a fp16 checkpoint or directly load a fp8 checkpoint without specifying any arguments.
- To enable experimental torch.compile support, you can add `--enable-torch-compile`. It accelerates small models on small batch sizes.
### Run Llama 3.1 405B
```bash
# 2 nodes run 405B fp16
# replace the `172.16.4.52:20000` with your own first node ip address and port, disable CUDA Grpah temporarily