- 12 Jun, 2025 1 commit
-
-
wenjh authored
Same intention of commit 3e38a2ea . This commit is to improve acc. Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 11 Jun, 2025 1 commit
-
-
yuguo authored
-
- 10 Jun, 2025 1 commit
-
-
yuguo authored
-
- 09 Jun, 2025 4 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 06 Jun, 2025 1 commit
-
-
wenjh authored
quantize_transpose_vector_blockwise function use lds exceeding 64kb when input type is fp32. But max size of lds in dcu is 64kb, thus we use lds as bfp16 for workaround. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 05 Jun, 2025 2 commits
-
-
-
yuguo authored
-
- 04 Jun, 2025 4 commits
-
-
-
yuguo authored
-
yuguo authored
Merge branch 'develop_v2.3' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into develop_v2.3
-
yuguo authored
-
- 28 May, 2025 2 commits
- 27 May, 2025 7 commits
- 26 May, 2025 3 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Use ocp fp8. Workaround: test_cast_float8blockwise.cu link wrong std::max Signed-off-by:wenjh <wenjh@sugon.com>
-
- 23 May, 2025 2 commits
-
-
-
yuguo authored
-
- 22 May, 2025 4 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 21 May, 2025 4 commits
- 20 May, 2025 4 commits