.github/workflows/amd_ci.yml · 7ffc5b4418ad297fe05ddfe8007db38b3eb54d8b · OpenDAS / tilelang

"git@developer.sourcefind.cn:wuxk1/megatron-lm.git" did not exist on "1e28344934b2eafdd1d4833b56c74f58de30d9ed"

[Cache] Introduce detailed target information for the disk kernel cache (#780) · 7ffc5b44

Lei Wang authored Sep 02, 2025

* Fix type hint for target_host parameter in compile function to allow None value

* Refactor target handling in compile function to utilize determine_target for improved clarity and consistency

* Update PrintConst function in codegen_cuda.cc to use hexfloat format for bfloat16 and float8/float4 types, while adding scientific notation comments for clarity. This change enhances the representation of floating-point constants in the generated code.

* Refactor PrintType function in codegen_cuda.cc to remove unnecessary failure conditions for floating-point types with lane counts greater than 4. This change simplifies the logic and improves code clarity.

* Enhance benchmark_matmul.py to conditionally print Reference TFlops only if ref_latency is not None. Update param.py to ensure target is converted to string for consistency. Refactor tuner.py to utilize determine_target for improved clarity in target handling.

* Remove automatic commit and push step from AMD and NVIDIA CI workflows to streamline the process and avoid unnecessary commits.

7ffc5b44

amd_ci.yml 4.2 KB

Replace amd_ci.yml