Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
OpenDAS
vllm_cscc
Commits
2a602b05
Unverified
Commit
2a602b05
authored
Mar 13, 2025
by
Jeff Daily
Committed by
GitHub
Mar 13, 2025
Browse files
forward fix PR 14245, restore build on ROCm 6.2 (#14709)
Signed-off-by:
Jeff Daily
<
jeff.daily@amd.com
>
parent
7888e1d0
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
12 additions
and
0 deletions
+12
-0
csrc/quantization/fp8/amd/quant_utils.cuh
csrc/quantization/fp8/amd/quant_utils.cuh
+12
-0
No files found.
csrc/quantization/fp8/amd/quant_utils.cuh
View file @
2a602b05
...
...
@@ -19,12 +19,24 @@ __device__ __forceinline__ fp8_type cvt_c10(float const r) {
return
{};
}
// __hip_fp8_e4m3 only exists starting in ROCm 6.3. The macro
// HIP_FP8_TYPE_OCP comes from the hip_fp8.h header and also makes
// its first appearance in ROCm 6.3. Since VLLM_DISPATCH_FP8_TYPES
// on ROCm instantiates both OCP and FNUZ kernels, we need to replace
// the new HW cvt with something reasonable that doesn't rely on the
// ROCm 6.3 feature. This allows compiling on ROCm 6.2 or newer.
template
<
>
__device__
__forceinline__
c10
::
Float8_e4m3fn
cvt_c10
(
float
const
r
)
{
#if HIP_FP8_TYPE_OCP
return
c10
::
Float8_e4m3fn
(
__hip_cvt_float_to_fp8
(
r
,
__hip_fp8_e4m3
::
__default_saturation
,
__hip_fp8_e4m3
::
__default_interpret
),
c10
::
Float8_e4m3fn
::
from_bits
());
#else
// Cast implemented by pytorch. Uses bit manipulation instead of HW cvt.
// HW cvt above is faster when it is available (ROCm 6.3 or newer).
return
static_cast
<
c10
::
Float8_e4m3fn
>
(
r
);
#endif
}
template
<
>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment