• arai713's avatar
    Transpose 3d (#984) · 3af8c81a
    arai713 authored
    
    
    * added working example for 5D input using 1D kernel
    
    * example with 5D input tensor and 2d kernel - not working: issues with arguments
    
    * added updated version of 3d device op - changed descriptors/dims
    
    * added example file to check kernel
    
    * fixed descriptor and isSupportedArgument stride problem
    
    * added and modified kernel for 3d - updated tids/loop
    
    * adding some more 5d example files
    
    * fixed some issues
    
    * changes made for testing
    
    * working version: fixed error in stride for A, still a bit inefficient
    
    * cleaned up formatting/comments
    
    * updating formatting
    
    * more formatting fixes
    
    * fixing cmake, adding back gpu targets in cmake script
    
    * adding client example
    
    * added instances for client example
    
    * fixed errors in client example
    
    * implemented client ex with device_elementwise.hpp and device_elementwise_3d_impl.hpp
    
    * removed extra files
    
    * minor formatting and naming fixes
    
    * adding test files and profiler
    
    * fixing minor error
    
    * minor fix
    
    * removed unneccesary comments, renamed files
    
    * updated instance list for client example, added different layout example
    
    * removing instances
    
    * fixed error in instance generation
    
    * remove comments
    
    * update profiler and client example tensor layouts
    
    * fixed errors in test/profiler
    
    * updated vector dim access to enable vector load
    
    * updated test/profiler files
    
    * updated example with 1d kernel
    
    * updating profiler
    
    * renamed files
    
    ---------
    Co-authored-by: default avatarJing Zhang <jizha@amd.com>
    3af8c81a
profile_transpose.cpp 2.54 KB