• Haocong WANG's avatar
    Merge origin dev (#2) · cad3212d
    Haocong WANG authored
    
    
    * [Navi3x] Fix Gridwise_multiple_d operation (#649)
    
    * Add CMake Option "USE_OPT_NAVI3X"
    
    * fix bug
    
    * standardize docs (#655)
    
    * Separate bibtex requirement from rocm-docs-core (#656)
    
    * separate bibtex requirement from rocm-docs-core
    
    * point requirements to source rocm-docs-core repo
    
    * Add CMake Option "USE_OPT_NAVI3X" (#647)
    
    * Add CMake Option "USE_OPT_NAVI3X"
    
    * remove navi3x opt compile option from cmake script
    
    * Conv + quantization + tanh  (#645)
    
    * Rename file. Prepare to support another activation
    
    * Add comment for quantization
    
    * Extract out_elementop
    
    * Add tanh example
    
    * Add conv + bias + tanh quantization instance
    
    * Add missing parameter
    
    * Refine cmake
    
    * Add external api and client example
    
    * Extract variable in example
    
    * Fix the comment
    
    ---------
    Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
    
    * Add a denorm test fix (#603)
    
    * Add type_convert implementations for bf16
    
    * Add the fix for conv_fwd
    
    * Add the fix for conv_bwd_data
    
    * Add the fix for conv_bwd_weight
    
    * Format
    
    * Format
    
    * Another format
    
    * Add a macro to use workaround on MI200 only
    
    * Format
    
    ---------
    Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
    Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
    
    * simplify karg in device/grid of split-k op (#644)
    
    * simplify karg in device/grid split-k op
    
    * fix mk_kn_mn instances
    
    * add more instances
    
    * use name from tensor layout
    
    * fix 3rd dword of buffer source descriptor (#659)
    
    * add fp64 instances (#658)
    Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
    
    * Issue #666: Revert "simplify karg in device/grid of split-k op (#644)" (#665)
    
    This reverts commit bb5530af
    
    .
    
    * Groupnorm + swish external api (#668)
    
    * Rename to proper naming
    
    * Add example of groupnorm + swish
    
    * Extract duplicate code in example
    
    * Add groupnorm + swish instances
    
    * Ractor instance generation, split into multiple cpp file
    
    * Add external api and client example
    
    * Refine profiler message
    
    * Use ck math version of exp
    
    * Refine problem size in example
    
    * Add host version of exp
    
    * add a marco to turn on/off denorm fix (off by default) (#673)
    
    * add a marco to turn off denorm fix by default
    
    * expose the marco
    
    ---------
    Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
    
    * fixed quant example (#672)
    Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
    
    * Add dependabot config and pin rocm-docs-core (#663)
    
    * [gtest] suppress unsafe buffer warn (#670)
    
    ref: https://github.com/ROCmSoftwarePlatform/MIOpen/pull/1912
    
    
    
    * Add memory index guard in wmma device ops (#667)
    
    * Add more macros to turn on/off denorm fix (#678)
    Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
    
    * Fix a typo (#676)
    
    * Add (#677)
    
    * Allow using ROCm release candidate compilers. (#679)
    
    * enable use of rocm5.5 release candidate 4
    
    * upgrade to ROCM5.5 RC5
    
    * try fix the PUB_KEY error, remove the cmake-data package
    
    * upgrade to latest cmake version
    
    * use private dockerhub repo for rocm5.5 rc5
    
    * add missing bracket
    
    * add vector load check
    
    * solve conflicts
    
    ---------
    Co-authored-by: default avatarSam Wu <sjwu@ualberta.ca>
    Co-authored-by: default avatarSam Wu <sam.wu2@amd.com>
    Co-authored-by: default avatarrocking5566 <ChunYu.Lai@amd.com>
    Co-authored-by: default avatarzjing14 <zhangjing14@gmail.com>
    Co-authored-by: default avatarRostyslav Geyyer <46627076+geyyer@users.noreply.github.com>
    Co-authored-by: default avatarRosty Geyyer <rosty.geyyer@amd.com>
    Co-authored-by: default avatarcarlushuang <carlus.huang@amd.com>
    Co-authored-by: default avatarroot <root@ctr-ubbsmc15.amd.com>
    Co-authored-by: default avatarJun Liu <Liu.Jun@amd.com>
    Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
    cad3212d
contraction_scale_fp32.cpp 7.95 KB