Commit 27ddce40 authored by wenjh's avatar wenjh
Browse files

Merge branch 'nv_main'

parents d262ef4c 5b3092a0
...@@ -55,6 +55,9 @@ jobs: ...@@ -55,6 +55,9 @@ jobs:
|| github.actor == 'pstjohn' || github.actor == 'pstjohn'
|| github.actor == 'vcherepanov-nv' || github.actor == 'vcherepanov-nv'
|| github.actor == 'tdophung' || github.actor == 'tdophung'
|| github.actor == 'vthumbe1503'
|| github.actor == 'janekb04'
|| github.actor == 'shengfangd'
) )
steps: steps:
- name: Check if comment is issued by authorized person - name: Check if comment is issued by authorized person
......
[submodule "3rdparty/googletest"] [submodule "3rdparty/googletest"]
path = 3rdparty/googletest path = 3rdparty/googletest
url = https://github.com/google/googletest.git url = https://github.com/google/googletest.git
[submodule "3rdparty/cudnn-frontend"]
path = 3rdparty/cudnn-frontend
url = https://github.com/NVIDIA/cudnn-frontend.git
[submodule "3rdparty/hipify_torch"] [submodule "3rdparty/hipify_torch"]
path = 3rdparty/hipify_torch path = 3rdparty/hipify_torch
url = https://github.com/ROCm/hipify_torch.git url = https://github.com/ROCm/hipify_torch.git
cutlass @ 57e3cfb4
Subproject commit 57e3cfb47a2d9e0d46eb6335c3dc411498efa198
...@@ -176,15 +176,15 @@ For example to use the NGC PyTorch container interactively, ...@@ -176,15 +176,15 @@ For example to use the NGC PyTorch container interactively,
.. code-block:: bash .. code-block:: bash
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:25.04-py3 docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:25.08-py3
For example to use the NGC JAX container interactively, For example to use the NGC JAX container interactively,
.. code-block:: bash .. code-block:: bash
docker run --gpus all -it --rm nvcr.io/nvidia/jax:25.04-py3 docker run --gpus all -it --rm nvcr.io/nvidia/jax:25.08-py3
Where 25.04 (corresponding to April 2025 release) is the container version. Where 25.08 (corresponding to August 2025 release) is the container version.
**Benefits of using NGC containers:** **Benefits of using NGC containers:**
......
...@@ -13,7 +13,7 @@ import shutil ...@@ -13,7 +13,7 @@ import shutil
import subprocess import subprocess
import sys import sys
from pathlib import Path from pathlib import Path
from importlib.metadata import version from importlib.metadata import version as get_version
from subprocess import CalledProcessError from subprocess import CalledProcessError
from typing import List, Optional, Tuple, Union from typing import List, Optional, Tuple, Union
...@@ -307,7 +307,7 @@ def cuda_version() -> Tuple[int, ...]: ...@@ -307,7 +307,7 @@ def cuda_version() -> Tuple[int, ...]:
return tuple(int(v) for v in version) return tuple(int(v) for v in version)
try: try:
version_str = version("nvidia-cuda-runtime-cu12") version_str = get_version("nvidia-cuda-runtime-cu12")
version_tuple = tuple(int(part) for part in version_str.split(".") if part.isdigit()) version_tuple = tuple(int(part) for part in version_str.split(".") if part.isdigit())
return version_tuple return version_tuple
except importlib.metadata.PackageNotFoundError: except importlib.metadata.PackageNotFoundError:
......
...@@ -49,7 +49,7 @@ pyTorch ...@@ -49,7 +49,7 @@ pyTorch
.. autoapifunction:: transformer_engine.pytorch.moe_permute .. autoapifunction:: transformer_engine.pytorch.moe_permute
.. autoapifunction:: transformer_engine.pytorch.moe_permute_with_probs .. autoapifunction:: transformer_engine.pytorch.moe_permute_with_probs
.. autoapifunction:: transformer_engine.pytorch.moe_unpermute .. autoapifunction:: transformer_engine.pytorch.moe_unpermute
...@@ -62,3 +62,6 @@ pyTorch ...@@ -62,3 +62,6 @@ pyTorch
.. autoapifunction:: transformer_engine.pytorch.initialize_ub .. autoapifunction:: transformer_engine.pytorch.initialize_ub
.. autoapifunction:: transformer_engine.pytorch.destroy_ub .. autoapifunction:: transformer_engine.pytorch.destroy_ub
.. autoapiclass:: transformer_engine.pytorch.UserBufferQuantizationMode
:members: FP8, NONE
\ No newline at end of file
...@@ -390,7 +390,7 @@ ...@@ -390,7 +390,7 @@
"| Attention Backend | Precision | Architecture | Sliding Window Attention | MQA/GQA | Multi-Latent Attention | Context Parallelism | Determinism Possible |\n", "| Attention Backend | Precision | Architecture | Sliding Window Attention | MQA/GQA | Multi-Latent Attention | Context Parallelism | Determinism Possible |\n",
"| :---------------- | :-------- | :----------- | :----------------------- | :------ | :--------------------- | :------------------ | :------------ |\n", "| :---------------- | :-------- | :----------- | :----------------------- | :------ | :--------------------- | :------------------ | :------------ |\n",
"| cuDNN attention (all frameworks) | BF16, FP16, FP8 (PyTorch only) | sm80+ | No | Yes | Yes | Yes (`bshd`,`sbhd`, `thd`) | Yes |\n", "| cuDNN attention (all frameworks) | BF16, FP16, FP8 (PyTorch only) | sm80+ | No | Yes | Yes | Yes (`bshd`,`sbhd`, `thd`) | Yes |\n",
"| flash-attention (PyTorch) | BF16, FP16 | sm80+ | Yes | Yes | No | Yes (`bshd`,`thd`) | Yes |\n", "| flash-attention (PyTorch) | BF16, FP16 | sm80+ | Yes | Yes | Yes | Yes (`bshd`,`thd`) | Yes |\n",
"| Framework-native attention | BF16, FP16, FP32 | Any | No, unless used as a mask | Yes | Yes (PyTorch only) | No | Yes |\n", "| Framework-native attention | BF16, FP16, FP32 | Any | No, unless used as a mask | Yes | Yes (PyTorch only) | No | Yes |\n",
"\n", "\n",
"Some unit tests are provided to serve as a starting point for integrating such features into users' models. For example,\n", "Some unit tests are provided to serve as a starting point for integrating such features into users' models. For example,\n",
......
...@@ -10,7 +10,7 @@ ...@@ -10,7 +10,7 @@
"\n", "\n",
"<b>Note:</b>\n", "<b>Note:</b>\n",
"\n", "\n",
"Currently, export to ONNX is supported only for high precision, FP8 delayed scaling and MXFP8.\n", "Currently, export to ONNX is supported only for high precision, FP8 delayed scaling, FP8 current scaling and MXFP8.\n",
"\n", "\n",
"</div>\n", "</div>\n",
"\n", "\n",
......
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
width="1280"
height="379.66562"
overflow="hidden"
version="1.1"
id="svg31"
sodipodi:docname="fp8_model_init.svg"
inkscape:version="1.4.2 (f4327f4, 2025-05-13)"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
<sodipodi:namedview
id="namedview1"
pagecolor="#ffffff"
bordercolor="#000000"
borderopacity="0.25"
inkscape:showpageshadow="2"
inkscape:pageopacity="0.0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#d1d1d1"
inkscape:zoom="1.8208"
inkscape:cx="685.41302"
inkscape:cy="184.80888"
inkscape:window-width="3440"
inkscape:window-height="1369"
inkscape:window-x="-8"
inkscape:window-y="-8"
inkscape:window-maximized="1"
inkscape:current-layer="g31" />
<defs
id="defs31">
<clipPath
clipPathUnits="userSpaceOnUse"
id="clipPath31">
<rect
style="fill:none"
id="rect32"
width="1390.9491"
height="379.66562"
x="-54.734409"
y="146.82722"
ry="36.489601" />
</clipPath>
</defs>
<g
id="g31"
clip-path="url(#clipPath31)"
transform="translate(0,-146.82722)">
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="24px"
id="text1"
x="153.29384"
y="195.21265">FP32/BF16</text>
<path
d="M 821,170 V 513.312"
stroke="#000000"
stroke-width="2"
stroke-miterlimit="8"
fill="none"
fill-rule="evenodd"
id="path1" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="24px"
id="text2"
x="616.69165"
y="194.66344">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="24px"
id="text3"
x="908.73199"
y="193.56503">FP8 with fp8_model_init()</text>
<rect
x="868"
y="326"
width="129"
height="164"
stroke="#042433"
stroke-width="2"
stroke-miterlimit="8"
fill="#e8e8e8"
id="rect3" />
<rect
x="882.45081"
y="381.1239"
width="101"
height="45"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect4" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text4"
x="920.40778"
y="400.1239">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text5"
x="911.3208"
y="416.1239">weight</text>
<rect
x="1078.4508"
y="381.1239"
width="82"
height="45"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#c1e5f5"
id="rect5" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text6"
x="1107.5007"
y="400.1239">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text7"
x="1098.8308"
y="416.1239">GEMM</text>
<path
d="m 983.45079,403.1239 h 89.04001 v 2 h -89.04001 z m 87.71001,-3 8,4 -8,4 z"
id="path7" />
<path
d="M 422,170 V 513.312"
stroke="#000000"
stroke-width="2"
stroke-miterlimit="8"
fill="none"
fill-rule="evenodd"
id="path9" />
<rect
x="54"
y="326"
width="129"
height="164"
stroke="#042433"
stroke-width="2"
stroke-miterlimit="8"
fill="#e8e8e8"
id="rect9" />
<rect
x="67.45079"
y="367.47629"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect10" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text10"
x="104.84079"
y="390.47629">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text11"
x="91.087494"
y="406.47629">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text12"
x="98.087494"
y="422.47629">weight</text>
<rect
x="270.45081"
y="240.47627"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect12" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text13"
x="307.6308"
y="263.47629">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text14"
x="293.87778"
y="279.47629">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text15"
x="305.79779"
y="295.47629">input</text>
<rect
x="270.45081"
y="367.47629"
width="103"
height="70"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#c1e5f5"
id="rect15" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text16"
x="307.6308"
y="390.47629">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text17"
x="293.87778"
y="406.47629">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text18"
x="301.29779"
y="422.47629">GEMM</text>
<path
d="m 170.4572,404.11625 93.11279,-0.59724 -0.0128,-1.99996 -93.11281,0.59725 z m 91.79869,2.41125 7.9742,-4.05123 -8.0255,-3.9486 z"
id="path18" />
<path
d="m 323.45079,311.47627 v 49.395 h -2 v -49.395 z m 3,48.061 -4,8 -4,-8 z"
id="path19" />
<rect
x="447"
y="326"
width="129"
height="164"
stroke="#042433"
stroke-width="2"
stroke-miterlimit="8"
fill="#e8e8e8"
id="rect19" />
<rect
x="460.90158"
y="368.57471"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect20" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text20"
x="497.76358"
y="392.57471">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text21"
x="484.01059"
y="408.57471">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text22"
x="491.01059"
y="424.57471">weight</text>
<rect
x="604.90161"
y="381.57471"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23"
x="633.21356"
y="399.57471">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24"
x="622.71356"
y="415.57471">Weight</text>
<g
id="g33"
transform="translate(70.847981,7.139719)">
<rect
x="638.21271"
y="302.41418"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22-2" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23-7"
x="666.52472"
y="320.41418">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24-6"
x="662.06604"
y="336.96341">Input</text>
</g>
<rect
x="708.90161"
y="381.57471"
width="82"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#c1e5f5"
id="rect26" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text27"
x="737.56158"
y="399.57471">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text28"
x="728.89557"
y="415.57471">GEMM</text>
<path
d="m 563.91732,405.21457 34.00266,-0.5351 -0.0314,-1.99976 -34.00273,0.53511 z m 32.71676,2.4855 7.9361,-4.12538 -8.062,-3.87362 z"
id="path28" />
<path
d="m 685.90158,402.57469 h 15.791 v 2 h -15.791 z m 14.458,-3 8,4 -8,4 z"
id="path29" />
<path
d="m 750.90158,284.49209 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30"
style="stroke-width:0.53037" />
<path
d="m 751.17135,355.90367 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30-2"
style="stroke-width:0.53037" />
<rect
x="701.05359"
y="215.25253"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect23" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text29"
x="738.23358"
y="238.25253">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text32"
x="724.48059"
y="254.25255">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text33"
x="736.40057"
y="270.25253">input</text>
<g
id="g33-9"
transform="translate(441.10986,7.0509646)">
<rect
x="638.21271"
y="302.41418"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22-2-5" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23-7-4"
x="666.52472"
y="320.41418">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24-6-3"
x="662.06604"
y="336.96341">Input</text>
</g>
<path
d="m 1121.1635,284.40334 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30-1"
style="stroke-width:0.53037" />
<path
d="m 1121.4332,355.81492 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30-2-2"
style="stroke-width:0.53037" />
<rect
x="1071.3154"
y="215.16379"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect23-3" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text29-3"
x="1108.4955"
y="238.16379">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text32-4"
x="1094.7424"
y="254.1638">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text33-1"
x="1106.6625"
y="270.16376">input</text>
</g>
</svg>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
width="960"
height="373.58408"
overflow="hidden"
version="1.1"
id="svg23"
sodipodi:docname="fp8_model_init_1_half.svg"
inkscape:version="1.4.2 (f4327f4, 2025-05-13)"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
<sodipodi:namedview
id="namedview1"
pagecolor="#ffffff"
bordercolor="#000000"
borderopacity="0.25"
inkscape:showpageshadow="2"
inkscape:pageopacity="0.0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#d1d1d1"
inkscape:zoom="3.1237948"
inkscape:cx="479.86506"
inkscape:cy="186.79204"
inkscape:window-width="3440"
inkscape:window-height="1369"
inkscape:window-x="-8"
inkscape:window-y="-8"
inkscape:window-maximized="1"
inkscape:current-layer="g23" />
<defs
id="defs23">
<clipPath
clipPathUnits="userSpaceOnUse"
id="clipPath23">
<rect
style="fill:none"
id="rect24"
width="997.38257"
height="373.58408"
x="-11.584002"
y="41.702408"
ry="36.489601" />
</clipPath>
</defs>
<g
id="g23"
clip-path="url(#clipPath23)"
transform="translate(0,-41.702408)">
<rect
x="0"
y="0"
width="960"
height="480"
fill="#ffffff"
id="rect1" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="22px"
transform="translate(195.4,93)"
id="text1">FP32/BF16</text>
<path
d="M 461,61 V 404.312"
stroke="#000000"
stroke-width="2"
stroke-miterlimit="8"
fill="none"
fill-rule="evenodd"
id="path1" />
<rect
x="92"
y="217"
width="129"
height="164"
stroke="#042433"
stroke-width="2"
stroke-miterlimit="8"
fill="#e8e8e8"
id="rect2" />
<rect
x="105.07926"
y="266.32938"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect3" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text3"
x="142.27226"
y="289.32938">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text4"
x="128.51926"
y="305.32938">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text5"
x="135.51926"
y="321.32938">weight</text>
<rect
x="308.07925"
y="138.32938"
width="103"
height="72"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect5" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text6"
x="345.06326"
y="162.32938">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text7"
x="331.31027"
y="178.32938">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text8"
x="343.23026"
y="194.32938">input</text>
<rect
x="308.07925"
y="266.32938"
width="103"
height="70"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#c1e5f5"
id="rect8" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text9"
x="345.06326"
y="289.32938">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text10"
x="331.30927"
y="305.32938">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text11"
x="338.72925"
y="321.32938">GEMM</text>
<path
d="m 208.08567,302.96936 93.11279,-0.59724 -0.0128,-1.99996 -93.11281,0.59724 z m 91.79869,2.41125 7.9742,-4.05123 -8.0255,-3.9486 z"
id="path11" />
<path
d="m 360.07926,210.32938 v 49.395 h -2 v -49.395 z m 3,48.061 -4,8 -4,-8 z"
id="path12" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="22px"
transform="translate(645.181,91)"
id="text23">FP8</text>
<rect
x="495.63504"
y="222.57803"
width="129"
height="164"
stroke="#042433"
stroke-width="2"
stroke-miterlimit="8"
fill="#e8e8e8"
id="rect19" />
<rect
x="509.53662"
y="265.15271"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect20" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text20"
x="546.39862"
y="289.15274">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text21"
x="532.64563"
y="305.15274">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text22"
x="539.64563"
y="321.15274">weight</text>
<rect
x="653.53668"
y="278.15274"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23-3"
x="681.84863"
y="296.15274">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24"
x="671.34863"
y="312.15274">Weight</text>
<g
id="g33"
transform="translate(119.48305,-96.282252)">
<rect
x="638.21271"
y="302.41418"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22-2" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23-7"
x="666.52472"
y="320.41418">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24-6"
x="662.06604"
y="336.96341">Input</text>
</g>
<rect
x="757.53668"
y="278.15274"
width="82"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#c1e5f5"
id="rect26" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text27"
x="786.19666"
y="296.15274">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text28"
x="777.53064"
y="312.15274">GEMM</text>
<path
d="m 612.55239,301.7926 34.00266,-0.5351 -0.0314,-1.99976 -34.00273,0.53511 z m 32.71676,2.4855 7.9361,-4.12538 -8.062,-3.87362 z"
id="path28" />
<path
d="m 734.53665,299.15272 h 15.791 v 2 h -15.791 z m 14.458,-3 8,4 -8,4 z"
id="path29" />
<path
d="m 799.53665,181.07012 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30"
style="stroke-width:0.53037" />
<path
d="m 799.80642,252.4817 v 21.98469 h -2 V 252.4817 Z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30-2"
style="stroke-width:0.53037" />
<rect
x="749.68866"
y="111.83057"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect23" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text29"
x="786.86865"
y="134.83058">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text32"
x="773.11566"
y="150.83058">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text33"
x="785.03564"
y="166.83057">input</text>
</g>
</svg>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
width="960"
height="379.95526"
overflow="hidden"
version="1.1"
id="svg19"
sodipodi:docname="fp8_model_init_2_half.svg"
inkscape:version="1.4.2 (f4327f4, 2025-05-13)"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
<sodipodi:namedview
id="namedview1"
pagecolor="#ffffff"
bordercolor="#000000"
borderopacity="0.25"
inkscape:showpageshadow="2"
inkscape:pageopacity="0.0"
inkscape:pagecheckerboard="0"
inkscape:deskcolor="#d1d1d1"
inkscape:zoom="2.1718178"
inkscape:cx="502.34416"
inkscape:cy="194.07705"
inkscape:window-width="3440"
inkscape:window-height="1369"
inkscape:window-x="-8"
inkscape:window-y="-8"
inkscape:window-maximized="1"
inkscape:current-layer="svg19" />
<defs
id="defs19">
<clipPath
clipPathUnits="userSpaceOnUse"
id="clipPath20">
<rect
style="fill:none"
id="rect21"
width="1014.7587"
height="379.95526"
x="-21.430403"
y="44.598408" />
</clipPath>
</defs>
<g
id="g19"
clip-path="url(#clipPath20)"
transform="translate(-76.837568,-52.086815)" />
<path
d="M 434.81331,26.957307 V 370.26931"
stroke="#000000"
stroke-width="2"
stroke-miterlimit="8"
fill="none"
fill-rule="evenodd"
id="path1" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="24px"
id="text2"
x="216.69165"
y="33.663437">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="24px"
id="text3"
x="508.73199"
y="32.565033">FP8 with fp8_model_init()</text>
<rect
x="481.81332"
y="182.95731"
width="129"
height="164"
stroke="#042433"
stroke-width="2"
stroke-miterlimit="8"
fill="#e8e8e8"
id="rect3" />
<rect
x="496.26413"
y="238.08121"
width="101"
height="45"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect4" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text4"
x="534.22107"
y="257.08121">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text5"
x="525.13409"
y="273.08121">weight</text>
<rect
x="692.2641"
y="238.08121"
width="82"
height="45"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#c1e5f5"
id="rect5" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text6"
x="721.31403"
y="257.08121">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text7"
x="712.6441"
y="273.08121">GEMM</text>
<path
d="m 597.2641,260.08121 h 89.04001 v 2 H 597.2641 Z m 87.71001,-3 8,4 -8,4 z"
id="path7" />
<rect
x="60.813313"
y="182.95731"
width="129"
height="164"
stroke="#042433"
stroke-width="2"
stroke-miterlimit="8"
fill="#e8e8e8"
id="rect19" />
<rect
x="74.714897"
y="225.53201"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect20" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text20"
x="111.5769"
y="249.53201">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text21"
x="97.823906"
y="265.53201">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text22"
x="104.82391"
y="281.53201">weight</text>
<rect
x="218.71492"
y="238.53201"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23"
x="247.02687"
y="256.53201">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24"
x="236.52687"
y="272.53201">Weight</text>
<g
id="g33"
transform="translate(-315.33871,-135.90297)">
<rect
x="638.21271"
y="302.41418"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22-2" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23-7"
x="666.52472"
y="320.41418">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24-6"
x="662.06604"
y="336.96341">Input</text>
</g>
<rect
x="322.71494"
y="238.53201"
width="82"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#c1e5f5"
id="rect26" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text27"
x="351.37491"
y="256.53201">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text28"
x="342.70889"
y="272.53201">GEMM</text>
<path
d="m 177.73063,262.17188 34.00266,-0.5351 -0.0314,-1.99976 -34.00273,0.53511 z m 32.71676,2.4855 7.9361,-4.12538 -8.062,-3.87362 z"
id="path28" />
<path
d="m 299.71489,259.532 h 15.791 v 2 h -15.791 z m 14.458,-3 8,4 -8,4 z"
id="path29" />
<path
d="m 364.71489,141.4494 v 21.98469 h -2 V 141.4494 Z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30"
style="stroke-width:0.53037" />
<path
d="m 364.98466,212.86098 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30-2"
style="stroke-width:0.53037" />
<rect
x="314.86691"
y="72.209839"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect23" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text29"
x="352.04691"
y="95.209839">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text32"
x="338.29391"
y="111.20985">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text33"
x="350.2139"
y="127.20984">input</text>
<g
id="g33-9"
transform="translate(54.923173,-135.99173)">
<rect
x="638.21271"
y="302.41418"
width="81"
height="44"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#fbe3d6"
id="rect22-2-5" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text23-7-4"
x="666.52472"
y="320.41418">FP8</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text24-6-3"
x="662.06604"
y="336.96341">Input</text>
</g>
<path
d="m 734.97681,141.36065 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30-1"
style="stroke-width:0.53037" />
<path
d="m 735.24651,212.77223 v 21.98469 h -2 v -21.98469 z m 3,21.60945 -4,2.25033 -4,-2.25033 z"
id="path30-2-2"
style="stroke-width:0.53037" />
<rect
x="685.12872"
y="72.121094"
width="103"
height="71"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect23-3" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text29-3"
x="722.30878"
y="95.121094">High</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text32-4"
x="708.55573"
y="111.12111">precision</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="13px"
id="text33-1"
x="720.47577"
y="127.12106">input</text>
</svg>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
width="1280"
height="303.21127"
overflow="hidden"
version="1.1"
id="svg12"
xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg">
<defs
id="defs12">
<clipPath
clipPathUnits="userSpaceOnUse"
id="clipPath16">
<rect
style="fill:none;stroke-width:0.96471"
id="rect16"
width="1344.0338"
height="303.21124"
x="-32.356411"
y="174.8833" />
</clipPath>
</defs>
<g
id="g12"
transform="translate(1.1556091e-7,-174.8833)"
clip-path="url(#clipPath16)">
<rect
x="0"
y="0"
width="1280"
height="720"
fill="#ffffff"
id="rect1" />
<path
d="M 645,209 V 446.818"
stroke="#000000"
stroke-width="2"
stroke-miterlimit="8"
fill="none"
fill-rule="evenodd"
id="path1" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="24px"
transform="translate(201.111,246)"
id="text1">Without CUDA Graphs</text>
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="700"
font-size="24px"
transform="translate(855.749,246)"
id="text2">With CUDA Graphs</text>
<rect
x="64"
y="319"
width="91"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#f2f2f2"
id="rect2" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(75.6135,349)"
id="text3">Launch 1</text>
<rect
x="155"
y="371"
width="90"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect3" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(169.288,401)"
id="text4">Kernel 1</text>
<rect
x="245"
y="319"
width="91"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#f2f2f2"
id="rect4" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(256.462,349)"
id="text5">Launch 2</text>
<rect
x="336"
y="371"
width="90"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect5" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(350.136,401)"
id="text6">Kernel 2</text>
<rect
x="426"
y="319"
width="91"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#f2f2f2"
id="rect6" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(437.31,349)"
id="text7">Launch 3</text>
<rect
x="517"
y="371"
width="90"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect7" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(530.984,401)"
id="text8">Kernel 3</text>
<path
d="m 47,368 h 574.291 v 4 H 47 Z m 572.291,-4 12,6 -12,6 z"
id="path8" />
<rect
x="680"
y="319"
width="145"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#f2f2f2"
id="rect8" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(694.058,349)"
id="text9">Launch Graph 1</text>
<rect
x="830"
y="370"
width="91"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect9" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(844.463,400)"
id="text10">Kernel 1</text>
<rect
x="924"
y="370"
width="90"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect10" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(938.451,400)"
id="text11">Kernel 2</text>
<rect
x="1018"
y="370"
width="90"
height="49"
stroke="#000000"
stroke-width="2"
stroke-linejoin="round"
stroke-miterlimit="10"
fill="#d9f2d0"
id="rect11" />
<text
font-family="'NVIDIA Sans', 'NVIDIA Sans_MSFontService', sans-serif"
font-weight="400"
font-size="16px"
transform="translate(1032.44,400)"
id="text12">Kernel 3</text>
<path
d="m 663,368 h 574.29 v 4 H 663 Z m 572.29,-4 12,6 -12,6 z"
id="path12" />
</g>
</svg>
transformers==4.55.0
accelerate==1.10.0
datasets==4.0.0
sentencepiece==0.2.1
This diff is collapsed.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment