"...composable_kernel.git" did not exist on "88833bd9ad99721fdc9f636e096710acf7e0b14f"
  1. 31 Jan, 2025 1 commit
    • arai713's avatar
      Codegen hipRTC compilation (#1579) · 2e3183af
      arai713 authored
      
      
      * updating codegen build for MIOpen access: adding .cmake for codegen component
      
      * updating CMake
      
      * adding in header guards for some headers due to issues with hiprtc compilation in MIOpen
      
      * some more header guards
      
      * putting env file in header guard
      
      * cleaning up some includes
      
      * updated types file for hiprtc purposes
      
      * fixed types file: bit-wise/memcpy issue
      
      * updating multiple utility files to deal with standard header inclusion for hiprtc
      
      * added some more header guards in the utility files, replacing some standard header functionality
      
      * added some more header guards
      
      * fixing some conflicts in utility files, another round of header guards
      
      * fixing errors in data type file
      
      * resolved conflict errors in a few utility files
      
      * added header guards/replicated functionality in device files
      
      * resolved issues with standard headers in device files: device_base and device_grouped_conv_fwd_multiple_abd
      
      * resolved issues with standard headers in device files: device_base.hpp, device_grouped_conv_fwd_multiple_abd.hpp, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp
      
      * added header guards for gridwise gemm files: gridwise_gemm_multiple_abd_xdl_cshuffle.hpp and gridwise_gemm_multiple_d_xdl_cshuffle.hpp
      
      * fixed issue with numerics header, removed from transform_conv_fwd_to_gemm and added to device_column_to_image_impl, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle, device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3, device_image_to_column_impl
      
      * replaced standard header usage and added header guards in block to ctile map and gridwise_gemm_pipeline_selector
      
      * resolved errors in device_gemm_xdl_splitk_c_shuffle files in regards to replacement of standard headers in previous commit
      
      * added replicated functionality for standard header methods in utility files
      
      * replaced standard header functionality in threadwise tensor slice transfer files and added header guards in element_wise_operation.hpp
      
      * temp fix for namespace error in MIOpen
      
      * remove standard header usage in codegen device op
      
      * removed standard header usage in elementwise files, resolved namespace errors
      
      * formatting fix
      
      * changed codegen argument to ON for testing
      
      * temporarily removing codegen compiler flag for testing purposes
      
      * added codegen flag again, set default to ON
      
      * set codegen flag default back to OFF
      
      * replaced enable_if_t standard header usage in data_type.hpp
      
      * added some debug prints to pinpoint issues in MIOpen
      
      * added print outs to debug in MIOpen
      
      * removed debug print outs from device op
      
      * resolved stdexcept include error
      
      * formatting fix
      
      * adding includes to new fp8 file to resolve ck::enable_if_t errors
      
      * made changes to amd_wave_read_first_lane
      
      * updated functionality in type utility file
      
      * fixed end of file issue
      
      * resovled errors in type utility file, added functionality to array utility file
      
      * fixed standard header usage replication in data_type file, resolves error with failing examples on navi3x
      
      * formatting fix
      
      * replaced standard header usage in amd_ck_fp8 file
      
      * added include to random_gen file
      
      * removed and replicated standard header usage from data_type and type_convert files for fp8 changes
      
      * replicated standard unsigned integer types in random_gen
      
      * resolved comments from review: put calls to reinterpret_cast for size_t in header guards
      
      * updated/added copyright headers
      
      * removed duplicate header
      
      * fixed typo in header guard
      
      * updated copyright headers
      
      ---------
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      2e3183af
  2. 09 Aug, 2024 1 commit
  3. 31 Jul, 2024 1 commit
  4. 25 Jun, 2024 1 commit
    • arai713's avatar
      CK Instance Gen (#1145) · 3e9711f0
      arai713 authored
      
      
      * Format
      
      * Format
      
      * Format
      
      * Remove const
      
      * Use the right template
      
      * Format
      
      * Format
      
      * add row/col instances
      
      * Add missing file
      
      * fixed
      
      * fixing block to etile error
      
      * Format
      
      * Updates
      
      * Format
      
      * fixed rrr layout
      
      * generating a sample JSON file: currently contains includes, prologue/epilogue and instances
      
      * version where the json is passed into the instances to generate a key
      
      * updated run function to just launch kernel
      
      * updated run function: only contains kernel object, json file is updated but still needs to be cleaned up, added front-end API to parse JSON into character buffer
      
      * adding in testing files
      
      * cleaned up comments, still need to work on including header files
      
      * removed unneeded files
      
      * removed/commented out JSON implementation
      
      * added fusion(prologue/epilogue) into instance generation
      
      * working on instance selection
      
      * added instance selection, need to fix instance validation
      
      * removed block2etile map validity check for testing purposes
      
      * test running: failing due to incorrect files/input
      
      * all grid descs/ptrs completed, but device file not found
      
      * Update test and embed modules
      
      * Restore older version
      
      * added convolution operation, written test, debugging generated code for compilation
      
      * attempting to include CK in host directory: _Float16 error
      
      * CK header file issues
      
      * slight fix
      
      * don't crash when hip can't report total memory
      
      * dump generated code to a file
      
      * changing sizes
      
      * creating tensor descriptors using CK methods: set up grid desc manually, also trying to set up an argument pointer - this needs to be fixed
      
      * some fixes to call the device code
      
      * separating test files for conv and gemm
      
      * completed arg ptr, now have linking errors
      
      * clang format fix
      
      * resolved linker issues in conv test
      
      * remove dependency on libutility from ck
      
      * resolved num dim error
      
      * properly passing arg ptr, errors with passing typenames: redefinition/redeclaration
      
      * undo the commenting of device function
      
      * hand created kernel code to find rtc issues
      
      * dump the full src to file
      
      * resolved redeclaration errors, cleaned up errors for Amber's kernel code
      
      * debugging purposes: redeclaration error
      
      * config files
      
      * resolved errors for NumTensor and redeclaration, formatted version.h
      
      * resolved most errors in manually added kernel and my own. error with calling kernel object: overloaded function type
      
      * WIP: close to getting kernel compiled
      
      * WIP: fixing rtc errors
      
      * fixed sequence errors, formatting, still one error with run fcn
      
      * yay: kernel compiles and runs
      
      * updated templated/generated version to run and compile
      
      * minor fixes
      
      * working generated example, resolved memory access error due to padding
      
      * adding in reference kernel, validation failing against reference
      
      * debugging: printing kernel argsz
      
      * reduced error in results
      
      * debugged reference kernel and output errors, added to generated version, currently debugging prologue function issues
      
      * working validation (using reference convolution) with prologue function for both hard-coded and generated version
      
      * WIP: create an alt version that creates Argument on the device
      
      * wip: added new duplicate files, fixed fusion templating errors from working example, setting up kernel arguments
      
      * wip: making necessary methods device code
      
      * added grid descs, working on grid pointers, errors with stl numerics
      
      * wip: updating kernel args - issue, replacing some std functions
      
      * replaced std::accumulate call with temp hardcoded version
      
      * wip: args causing memory issue
      
      * Construct Argument object inside the kernel and use it to call convolution device function. Code runs and verification passes
      
      * adding object file dump
      
      * temporary hardcoding of grid size, can remove device op inst + arg ptr
      
      * minor fix for grid size
      
      * added modified example where arg ptr is created on the device for generated version as well
      
      * removed device op instance and arg ptr from modified examples
      
      * moving device op file for testing purposes and to properly build CK
      
      * commenting out print-outs
      
      * adjust compiler args to produce a valid ELF file
      
      * temporary removal of validation
      
      * reverting compiler args back for working example
      
      * retrieve necessary arguments from generated template parameters in correct format
      
      * calculating grid size on host-side, still need to clean up process, pass parameters to host functions properly
      
      * scaled up factory functions/wrapper structs to implement host-side launch parameter calculations using CK host side functions - in hard-coded example
      
      * temporary change to generate ELF format binary object file
      
      * removed unecessary code, added comments
      
      * formatting fix
      
      * cleaned up code, added new tests, restructured library: move helper into CK
      
      * refactored launch parameter calculation to be more concise
      
      * renamed files and variables for more clarity/uniformity
      
      * more code cleaning, removed debug statements
      
      * moved majority of my files into codegen directory, running properly
      
      * updated Embed.cmake(string_view) in codegen directory
      
      * updated host directory to match Embed.cmake as well
      
      * added old tests in
      
      * updated instance generation methods to be more concise
      
      * removed layout from launch parameter calculation
      
      * working test
      
      * fixed issue with verification, all instances working
      
      * updated verification in other tests
      
      * removed duplicate matrix padder file, removed code dumps
      
      * removed old hard-coded tests
      
      * removed old host directory, all files in codegen directory now
      
      * fixed copyright in files
      
      * commenting out validation
      
      * renamed files
      
      * made changes for review: fixed copyright, renamed files for clarity, removed comments, refactored code
      
      * updated headers
      
      * removing duplicate file for fwd conv to gemm, merging with original file
      
      * fix building codegen with clang++ directly
      
      * resolving build error from conv_fwd_to_gemm
      
      * fix for previous error
      
      * renaming tests
      
      * created common test file
      
      * cleaned up code, added comments
      
      * renamed device op
      
      * fixed typos in comments
      
      * removed extra space
      
      * code cleanup: resolving Amber's comments
      
      * removed wrapper struct for matrix padder, fixed template
      
      * cleaned up if statements for better readability
      
      ---------
      Co-authored-by: default avatarPaul <pfultz2@yahoo.com>
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarM. Amber Hassaan <amber_474@yahoo.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      3e9711f0
  5. 06 Mar, 2024 1 commit
    • Paul Fultz II's avatar
      Add host lib (#1134) · 8eff4d62
      Paul Fultz II authored
      
      
      * Format
      
      * Format
      
      * Format
      
      * Remove const
      
      * Use the right template
      
      * Format
      
      * Format
      
      * add row/col instances
      
      * Add missing file
      
      * fixed
      
      * Format
      
      * Updates
      
      * Format
      
      * fixed rrr layout
      
      * Format
      
      * Update test and embed modules
      
      * Restore older version
      
      * Update year
      
      * Set -fPIC
      
      * Format
      
      * Use double for isnan
      
      * rename host folder to codegen + minor fix
      
      * add codegen CI test
      
      * add option to build components without building CK
      
      * fix the groovy syntax
      
      * fix typo
      
      * use the correct function for the codegen stage
      
      ---------
      Co-authored-by: default avatarJing Zhang <jizha@amd.com>
      Co-authored-by: default avatarIllia Silin <98187287+illsilin@users.noreply.github.com>
      Co-authored-by: default avatarillsilin <Illia.Silin@amd.com>
      8eff4d62