- 13 Jun, 2025 2 commits
- 12 Jun, 2025 4 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
-
wenjh authored
Same intention of commit 3e38a2ea . This commit is to improve acc. Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 11 Jun, 2025 2 commits
- 10 Jun, 2025 2 commits
- 09 Jun, 2025 7 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
[DCU] fix See merge request dcutoolkit/deeplearing/TransformerEngine!24
-
yuguo authored
-
yuguo authored
[DCU] surpport cast master weight to int8 See merge request dcutoolkit/deeplearing/TransformerEngine!23
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 06 Jun, 2025 1 commit
-
-
wenjh authored
quantize_transpose_vector_blockwise function use lds exceeding 64kb when input type is fp32. But max size of lds in dcu is 64kb, thus we use lds as bfp16 for workaround. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 05 Jun, 2025 2 commits
-
-
-
yuguo authored
-
- 04 Jun, 2025 4 commits
-
-
-
yuguo authored
-
yuguo authored
Merge branch 'develop_v2.3' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into develop_v2.3
-
yuguo authored
-
- 28 May, 2025 2 commits
- 27 May, 2025 7 commits
- 26 May, 2025 3 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Use ocp fp8. Workaround: test_cast_float8blockwise.cu link wrong std::max Signed-off-by:wenjh <wenjh@sugon.com>
-
- 23 May, 2025 2 commits
-
-
-
yuguo authored
-
- 22 May, 2025 2 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-