API_Reference_Guide.html 53.2 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />

  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>5. API Reference Guide &mdash; Composable Kernel (CK)  documentation</title>
      <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
      <link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
  <!--[if lt IE 9]>
    <script src="_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
        <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
        <script src="_static/doctools.js"></script>
        <script src="_static/sphinx_highlight.js"></script>
    <script src="_static/js/theme.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="6. Contributor’s Guide" href="Contributors_Guide.html" />
    <link rel="prev" title="4. Supported Primitives Guide" href="Supported_Primitives_Guide.html" /> 
</head>

<body class="wy-body-for-nav"> 
  <div class="wy-grid-for-nav">
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >

          
          
          <a href="index.html">
            
              <img src="_static/rocm_logo.png" class="logo" alt="Logo"/>
          </a>
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" aria-label="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>
        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
              <p class="caption" role="heading"><span class="caption-text">Contents:</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="Linux_Install_Guide.html">1. Getting Started Guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="tutorial_hello_world.html">2. CK Hello world</a></li>
<li class="toctree-l1"><a class="reference internal" href="dockerhub.html">3. CK docker hub</a></li>
<li class="toctree-l1"><a class="reference internal" href="Supported_Primitives_Guide.html">4. Supported Primitives Guide</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">5. API Reference Guide</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#introduction">5.1. Introduction</a></li>
<li class="toctree-l2"><a class="reference internal" href="#using-ck-api">5.2. Using CK API</a></li>
<li class="toctree-l2"><a class="reference internal" href="#ck-datatypes">5.3. CK Datatypes</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#devicemem">5.3.1. DeviceMem</a></li>
<li class="toctree-l3"><a class="reference internal" href="#kernels-for-flashattention">5.3.2. Kernels For Flashattention</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Contributors_Guide.html">6. Contributor’s Guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="Disclaimer.html">7. Disclaimer</a></li>
</ul>

        </div>
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="index.html">Composable Kernel (CK)</a>
      </nav>

      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="index.html" class="icon icon-home" aria-label="Home"></a></li>
      <li class="breadcrumb-item active"><span class="section-number">5. </span>API Reference Guide</li>
      <li class="wy-breadcrumbs-aside">
            <a href="_sources/API_Reference_Guide.rst.txt" rel="nofollow"> View page source</a>
      </li>
  </ul>
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
             
  <section id="api-reference-guide">
<h1><span class="section-number">5. </span>API Reference Guide<a class="headerlink" href="#api-reference-guide" title="Permalink to this heading"></a></h1>
<section id="introduction">
<h2><span class="section-number">5.1. </span>Introduction<a class="headerlink" href="#introduction" title="Permalink to this heading"></a></h2>
<p>This document contains details of the APIs for the Composable Kernel (CK) library and introduces some of the key design
principles that are used to write new classes that extend CK functionality.</p>
</section>
<section id="using-ck-api">
<h2><span class="section-number">5.2. </span>Using CK API<a class="headerlink" href="#using-ck-api" title="Permalink to this heading"></a></h2>
<p>This section describes how to use the CK library API.</p>
</section>
<section id="ck-datatypes">
<h2><span class="section-number">5.3. </span>CK Datatypes<a class="headerlink" href="#ck-datatypes" title="Permalink to this heading"></a></h2>
<section id="devicemem">
<h3><span class="section-number">5.3.1. </span>DeviceMem<a class="headerlink" href="#devicemem" title="Permalink to this heading"></a></h3>
<dl class="cpp struct">
<dt class="sig sig-object cpp" id="_CPPv49DeviceMem">
<span id="_CPPv39DeviceMem"></span><span id="_CPPv29DeviceMem"></span><span id="DeviceMem"></span><span class="target" id="struct_device_mem"></span><span class="k"><span class="pre">struct</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DeviceMem</span></span></span><a class="headerlink" href="#_CPPv49DeviceMem" title="Permalink to this definition"></a><br /></dt>
<dd><p>Container for storing data in GPU device memory. </p>
</dd></dl>

</section>
<section id="kernels-for-flashattention">
<h3><span class="section-number">5.3.2. </span>Kernels For Flashattention<a class="headerlink" href="#kernels-for-flashattention" title="Permalink to this heading"></a></h3>
<p>The Flashattention algorithm is defined in <span id="id1">Dao <em>et al.</em> [<a class="reference internal" href="#id3" title="Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: fast and memory-efficient exact attention with io-awareness. arXiv preprint arXiv:2205.14135, 2022.">DFE+22</a>]</span>.  This sections lists the classes that are
used in the CK GPU implementation of Flashattention.</p>
<p><strong>Gridwise classes</strong></p>
<dl class="cpp struct">
<dt class="sig sig-object cpp" id="_CPPv4I000000000_25InMemoryDataOperationEnum0000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t_7index_t_7index_t0_7index_t_13LoopScheduler_b_b_15PipelineVersionEN2ck43GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffleE">
<span id="_CPPv3I000000000_25InMemoryDataOperationEnum0000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t_7index_t_7index_t0_7index_t_13LoopScheduler_b_b_15PipelineVersionEN2ck43GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffleE"></span><span id="_CPPv2I000000000_25InMemoryDataOperationEnum0000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t_7index_t_7index_t0_7index_t_13LoopScheduler_b_b_15PipelineVersionEN2ck43GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffleE"></span><span class="k"><span class="pre">template</span></span><span class="p"><span class="pre">&lt;</span></span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">FloatAB</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">FloatGemmAcc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">FloatCShuffle</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">FloatC</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AccElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1ElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">CElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">InMemoryDataOperationEnum</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">CGlobalMemoryDataOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AGridDesc_AK0_M_AK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BGridDesc_BK0_N_BK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1GridDesc_BK0_N_BK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">CGridDesc_M_N</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NumGemmKPrefetchStage</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BlockSize</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">MPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">KPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Gemm1NPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Gemm1KPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AK1Value</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BK1Value</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1K1Value</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">MPerXdl</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NPerXdl</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">MXdlPerWave</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NXdlPerWave</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">Gemm1NXdlPerWave</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ABlockTransferThreadClusterLengths_AK0_M_AK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ABlockTransferThreadClusterArrangeOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ABlockTransferSrcAccessOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ABlockTransferSrcVectorDim</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ABlockTransferSrcScalarPerVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ABlockTransferDstScalarPerVector_AK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AThreadTransferSrcResetCoordinateAfterRun</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ABlockLdsExtraM</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BBlockTransferThreadClusterLengths_BK0_N_BK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BBlockTransferThreadClusterArrangeOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BBlockTransferSrcAccessOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BBlockTransferSrcVectorDim</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BBlockTransferSrcScalarPerVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BBlockTransferDstScalarPerVector_BK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BThreadTransferSrcResetCoordinateAfterRun</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BBlockLdsExtraN</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1BlockTransferThreadClusterLengths_BK0_N_BK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1BlockTransferThreadClusterArrangeOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1BlockTransferSrcAccessOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1BlockTransferSrcVectorDim</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1BlockTransferSrcScalarPerVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1BlockTransferDstScalarPerVector_BK1</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1ThreadTransferSrcResetCoordinateAfterRun</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">B1BlockLdsExtraN</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">CShuffleMXdlPerWavePerShuffle</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">CShuffleNXdlPerWavePerShuffle</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">CShuffleBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">CShuffleBlockTransferScalarPerVector_NPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">LoopScheduler</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">LoopSched</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PadN</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">MaskOutUpperTriangle</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">PipelineVersion</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">PipelineVer</span></span></span><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="n"><span class="pre">PipelineVersion</span></span><span class="p"><span class="pre">::</span></span><span class="n"><span class="pre">v1</span></span><span class="p"><span class="pre">&gt;</span></span><br /><span class="target" id="structck_1_1_gridwise_batched_gemm_softmax_gemm___xdl___c_shuffle"></span><span class="k"><span class="pre">struct</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffle</span></span></span><a class="headerlink" href="#_CPPv4I000000000_25InMemoryDataOperationEnum0000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t000_7index_t_7index_t_7index_t_b_7index_t_7index_t_7index_t0_7index_t_13LoopScheduler_b_b_15PipelineVersionEN2ck43GridwiseBatchedGemmSoftmaxGemm_Xdl_CShuffleE" title="Permalink to this definition"></a><br /></dt>
<dd><p>Gridwise gemm + softmax + gemm fusion. </p>
</dd></dl>

<p><strong>Blockwise classes</strong></p>
<dl class="cpp struct">
<dt class="sig sig-object cpp" id="_CPPv4I000_25InMemoryDataOperationEnum000000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_b_7index_tEN2ck35ThreadGroupTensorSliceTransfer_v4r1E">
<span id="_CPPv3I000_25InMemoryDataOperationEnum000000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_b_7index_tEN2ck35ThreadGroupTensorSliceTransfer_v4r1E"></span><span id="_CPPv2I000_25InMemoryDataOperationEnum000000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_b_7index_tEN2ck35ThreadGroupTensorSliceTransfer_v4r1E"></span><span class="k"><span class="pre">template</span></span><span class="p"><span class="pre">&lt;</span></span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadGroup</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">InMemoryDataOperationEnum</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstInMemOp</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BlockSliceLengths</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadClusterLengths</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadClusterArrangeOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcData</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstData</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcDimAccessOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstDimAccessOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcVectorDim</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstVectorDim</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcScalarPerVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstScalarPerVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcScalarStrideInVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstScalarStrideInVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadTransferSrcResetCoordinateAfterRun</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadTransferDstResetCoordinateAfterRun</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NumThreadScratch</span></span></span><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="m"><span class="pre">1</span></span><span class="p"><span class="pre">&gt;</span></span><br /><span class="target" id="structck_1_1_thread_group_tensor_slice_transfer__v4r1"></span><span class="k"><span class="pre">struct</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadGroupTensorSliceTransfer_v4r1</span></span></span><a class="headerlink" href="#_CPPv4I000_25InMemoryDataOperationEnum000000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_b_7index_tEN2ck35ThreadGroupTensorSliceTransfer_v4r1E" title="Permalink to this definition"></a><br /></dt>
<dd><p>Blockwise data transfer. </p>
<p>This version does following things to avoid scratch memory issue<ol class="loweralpha simple">
<li><p>Use StaticallyIndexedArray instead of C array for thread buffer</p></li>
<li><p>ThreadwiseTensorSliceTransfer_v3 does not keep reference to tensor descriptor</p></li>
<li><p>ThreadwiseTensorSliceTransfer_v3::Run() does not construct new tensor coordinate </p></li>
</ol>
</p>
</dd></dl>

<dl class="cpp struct">
<dt class="sig sig-object cpp" id="_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E">
<span id="_CPPv3I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E"></span><span id="_CPPv2I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E"></span><span class="k"><span class="pre">template</span></span><span class="p"><span class="pre">&lt;</span></span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BlockSize</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">FloatAB</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">FloatAcc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ATileDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BTileDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AMmaTileDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BMmaTileDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">MPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">KPerBlock</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">MPerXDL</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NPerXDL</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">MRepeat</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">NRepeat</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">KPack</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">TransposeC</span></span></span><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="k"><span class="pre">false</span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AMmaKStride</span></span></span><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::KPack"><span class="n"><span class="pre">KPack</span></span></a><span class="w"> </span><span class="o"><span class="pre">*</span></span><span class="w"> </span><span class="n"><span class="pre">XdlopsGemm</span></span><span class="p"><span class="pre">&lt;</span></span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::FloatAB"><span class="n"><span class="pre">FloatAB</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::MPerXDL"><span class="n"><span class="pre">MPerXDL</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::NPerXDL"><span class="n"><span class="pre">NPerXDL</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::KPack"><span class="n"><span class="pre">KPack</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::TransposeC"><span class="n"><span class="pre">TransposeC</span></span></a><span class="p"><span class="pre">&gt;</span></span><span class="p"><span class="pre">{</span></span><span class="p"><span class="pre">}</span></span><span class="p"><span class="pre">.</span></span><span class="n"><span class="pre">K0PerXdlops</span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BMmaKStride</span></span></span><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::KPack"><span class="n"><span class="pre">KPack</span></span></a><span class="w"> </span><span class="o"><span class="pre">*</span></span><span class="w"> </span><span class="n"><span class="pre">XdlopsGemm</span></span><span class="p"><span class="pre">&lt;</span></span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::FloatAB"><span class="n"><span class="pre">FloatAB</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::MPerXDL"><span class="n"><span class="pre">MPerXDL</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::NPerXDL"><span class="n"><span class="pre">NPerXDL</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::KPack"><span class="n"><span class="pre">KPack</span></span></a><span class="p"><span class="pre">,</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="ck::BlockwiseGemmXdlops_v2::TransposeC"><span class="n"><span class="pre">TransposeC</span></span></a><span class="p"><span class="pre">&gt;</span></span><span class="p"><span class="pre">{</span></span><span class="p"><span class="pre">}</span></span><span class="p"><span class="pre">.</span></span><span class="n"><span class="pre">K0PerXdlops</span></span><span class="p"><span class="pre">&gt;</span></span><br /><span class="target" id="structck_1_1_blockwise_gemm_xdlops__v2"></span><span class="k"><span class="pre">struct</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BlockwiseGemmXdlops_v2</span></span></span><a class="headerlink" href="#_CPPv4I_7index_t000000_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_7index_t_b_7index_t_7index_tEN2ck22BlockwiseGemmXdlops_v2E" title="Permalink to this definition"></a><br /></dt>
<dd><p>Blockwise gemm. </p>
<p>Supports<ol class="loweralpha simple">
<li><p>regular XDL output M2_M3_M4_M2 and transposed XDL output M2_N2_N3_N4</p></li>
<li><p>decoupled input tile descriptor and mma tile descriptor in order to support both vgpr and LDS source buffer</p></li>
<li><p>configurable k index starting position and step size after each FMA/XDL instruction </p></li>
</ol>
</p>
</dd></dl>

<dl class="cpp struct">
<dt class="sig sig-object cpp" id="_CPPv4I_7index_t0000_bEN2ck16BlockwiseSoftmaxE">
<span id="_CPPv3I_7index_t0000_bEN2ck16BlockwiseSoftmaxE"></span><span id="_CPPv2I_7index_t0000_bEN2ck16BlockwiseSoftmaxE"></span><span class="k"><span class="pre">template</span></span><span class="p"><span class="pre">&lt;</span></span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BlockSize</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">AccDataType</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadMap_M_K</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadClusterDesc_M_K</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadSliceDesc_M_K</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">IgnoreNaN</span></span></span><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="k"><span class="pre">false</span></span><span class="p"><span class="pre">&gt;</span></span><br /><span class="target" id="structck_1_1_blockwise_softmax"></span><span class="k"><span class="pre">struct</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">BlockwiseSoftmax</span></span></span><a class="headerlink" href="#_CPPv4I_7index_t0000_bEN2ck16BlockwiseSoftmaxE" title="Permalink to this definition"></a><br /></dt>
<dd><p>Blockwise softmax. </p>
<dl class="field-list simple">
<dt class="field-odd">Template Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><ul class="simple">
<li><p><strong>BlockSize</strong> – Block size </p></li>
<li><p><strong>AccDataType</strong> – Accumulator data type </p></li>
<li><p><strong>ThreadMap_M_K</strong> – Thread id to m_k </p></li>
<li><p><strong>ThreadClusterDesc_M_K</strong> – Threadwise cluster descriptor </p></li>
<li><p><strong>ThreadSliceDesc_M_K</strong> – Threadwise slices descriptor </p></li>
<li><p><strong>IgnoreNaN</strong> – Flag to ignore NaN, false by default </p></li>
</ul>
</dd>
</dl>
</dd></dl>

<p><strong>Threadwise classes</strong></p>
<dl class="cpp struct">
<dt class="sig sig-object cpp" id="_CPPv4I0000000_7index_t_7index_t_N9enable_ifIXaaclN7SrcDesc20IsKnownAtCompileTimeEEclN7DstDesc20IsKnownAtCompileTimeEEEbE4typeEEN2ck44ThreadwiseTensorSliceTransfer_StaticToStaticE">
<span id="_CPPv3I0000000_7index_t_7index_t_N9enable_ifIXaaclN7SrcDesc20IsKnownAtCompileTimeEEclN7DstDesc20IsKnownAtCompileTimeEEEbE4typeEEN2ck44ThreadwiseTensorSliceTransfer_StaticToStaticE"></span><span id="_CPPv2I0000000_7index_t_7index_t_N9enable_ifIXSrcDesc::IsKnownAtCompileTime() && DstDesc::IsKnownAtCompileTime()EbE4typeEEN2ck44ThreadwiseTensorSliceTransfer_StaticToStaticE"></span><span class="k"><span class="pre">template</span></span><span class="p"><span class="pre">&lt;</span></span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcData</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstData</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SrcDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstDesc</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ElementwiseOperation</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">SliceLengths</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DimAccessOrder</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstVectorDim</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="n"><span class="pre">index_t</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">DstScalarPerVector</span></span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="k"><span class="pre">typename</span></span><span class="w"> </span><span class="n"><span class="pre">enable_if</span></span><span class="p"><span class="pre">&lt;</span></span><a class="reference internal" href="#_CPPv4I0000000_7index_t_7index_t_N9enable_ifIXaaclN7SrcDesc20IsKnownAtCompileTimeEEclN7DstDesc20IsKnownAtCompileTimeEEEbE4typeEEN2ck44ThreadwiseTensorSliceTransfer_StaticToStaticE" title="ck::ThreadwiseTensorSliceTransfer_StaticToStatic::SrcDesc"><span class="n"><span class="pre">SrcDesc</span></span></a><span class="p"><span class="pre">::</span></span><span class="n"><span class="pre">IsKnownAtCompileTime</span></span><span class="p"><span class="pre">(</span></span><span class="p"><span class="pre">)</span></span><span class="w"> </span><span class="o"><span class="pre">&amp;&amp;</span></span><span class="w"> </span><a class="reference internal" href="#_CPPv4I0000000_7index_t_7index_t_N9enable_ifIXaaclN7SrcDesc20IsKnownAtCompileTimeEEclN7DstDesc20IsKnownAtCompileTimeEEEbE4typeEEN2ck44ThreadwiseTensorSliceTransfer_StaticToStaticE" title="ck::ThreadwiseTensorSliceTransfer_StaticToStatic::DstDesc"><span class="n"><span class="pre">DstDesc</span></span></a><span class="p"><span class="pre">::</span></span><span class="n"><span class="pre">IsKnownAtCompileTime</span></span><span class="p"><span class="pre">(</span></span><span class="p"><span class="pre">)</span></span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="kt"><span class="pre">bool</span></span><span class="p"><span class="pre">&gt;</span></span><span class="p"><span class="pre">::</span></span><span class="n"><span class="pre">type</span></span><span class="w"> </span><span class="p"><span class="pre">=</span></span><span class="w"> </span><span class="k"><span class="pre">false</span></span><span class="p"><span class="pre">&gt;</span></span><br /><span class="target" id="structck_1_1_threadwise_tensor_slice_transfer___static_to_static"></span><span class="k"><span class="pre">struct</span></span><span class="w"> </span><span class="sig-name descname"><span class="n"><span class="pre">ThreadwiseTensorSliceTransfer_StaticToStatic</span></span></span><a class="headerlink" href="#_CPPv4I0000000_7index_t_7index_t_N9enable_ifIXaaclN7SrcDesc20IsKnownAtCompileTimeEEclN7DstDesc20IsKnownAtCompileTimeEEEbE4typeEEN2ck44ThreadwiseTensorSliceTransfer_StaticToStaticE" title="Permalink to this definition"></a><br /></dt>
<dd><p>Threadwise data transfer. </p>
<p>Do NOT involve any tensor coordinates with StaticBuffer </p>
</dd></dl>

<div class="docutils container" id="id2">
<div class="citation" id="id3" role="doc-biblioentry">
<span class="label"><span class="fn-bracket">[</span><a role="doc-backlink" href="#id1">DFE+22</a><span class="fn-bracket">]</span></span>
<p>Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: fast and memory-efficient exact attention with io-awareness. <em>arXiv preprint arXiv:2205.14135</em>, 2022.</p>
</div>
</div>
</div>
</section>
</section>
</section>


           </div>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="Supported_Primitives_Guide.html" class="btn btn-neutral float-left" title="4. Supported Primitives Guide" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="Contributors_Guide.html" class="btn btn-neutral float-right" title="6. Contributor’s Guide" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2018-2023, Advanced Micro Devices.</p>
  </div>

  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
    provided by <a href="https://readthedocs.org">Read the Docs</a>.
   

</footer>
        </div>
      </div>
    </section>
  </div>
  <script>
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script> 

</body>
</html>