- 17 Jul, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
- 16 Jul, 2025 2 commits
-
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 15 Jul, 2025 3 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 11 Jul, 2025 2 commits
-
-
wenjh authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 09 Jul, 2025 2 commits
-
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 08 Jul, 2025 1 commit
-
-
yuguo authored
-
- 03 Jul, 2025 1 commit
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 01 Jul, 2025 1 commit
-
-
wenjh authored
Add env to chose blocklen of blockwise quantize. Signed-off-by:
wenjh <wenjh@sugon.com> Fix pytest of blockwise error Signed-off-by:
wenjh <wenjh@sugon.com> Resolve new api in int8 gemm test Signed-off-by:
wenjh <wenjh@sugon.com> Fix incorrect launch parm Signed-off-by:
wenjh <wenjh@sugon.com> Fix 1D blockwise(64) acc error Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 20 Jun, 2025 2 commits
- 19 Jun, 2025 2 commits
-
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 18 Jun, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 16 Jun, 2025 1 commit
-
-
yuguo authored
-
- 13 Jun, 2025 1 commit
-
-
yuguo authored
-
- 12 Jun, 2025 2 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
wenjh authored
Same intention of commit 3e38a2ea . This commit is to improve acc. Signed-off-by:
wenjh <wenjh@sugon.com>
-
- 11 Jun, 2025 1 commit
-
-
yuguo authored
-
- 10 Jun, 2025 1 commit
-
-
yuguo authored
-
- 09 Jun, 2025 4 commits
-
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
yuguo authored
-
yuguo authored
-
wenjh authored
Signed-off-by:wenjh <wenjh@sugon.com>
-
- 06 Jun, 2025 1 commit
-
-
wenjh authored
quantize_transpose_vector_blockwise function use lds exceeding 64kb when input type is fp32. But max size of lds in dcu is 64kb, thus we use lds as bfp16 for workaround. Signed-off-by:wenjh <wenjh@sugon.com>
-
- 05 Jun, 2025 2 commits
-
-
-
yuguo authored
-
- 04 Jun, 2025 4 commits
-
-
-
yuguo authored
-
yuguo authored
Merge branch 'develop_v2.3' of http://10.16.6.30/dcutoolkit/deeplearing/TransformerEngine into develop_v2.3
-
yuguo authored
-
- 28 May, 2025 2 commits
- 27 May, 2025 1 commit
-
-
wenjh authored
-