"vscode:/vscode.git/clone" did not exist on "b755db38667e529690c629a86926471c2e121455"
2_config_file_structure.rst 7.68 KB
Newer Older
1
..
2
    Copyright (c) 2022-2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3
4
5
6

    See LICENSE for license information.

Config File Structure
Paweł Gadziński's avatar
Paweł Gadziński committed
7
=====================
8
9
10
11
12

To enable debug features, create a configuration YAML file to specify the desired behavior, such as determining which GEMMs (General Matrix Multiply operations) should run in higher precision rather than FP8 and defining which statistics to log. 
Below, we outline how to structure the configuration YAML file.

General Format
Paweł Gadziński's avatar
Paweł Gadziński committed
13
14
--------------

15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

A config file can have one or more sections, each containing settings for specific layers and features:

.. code-block:: yaml

    section_name_1:
      enabled: ...
      layers:
        # Specify layers here...
      transformer_engine:
        Feature1Name:
          enabled: ...
          # Feature details...
        Feature2Name:
          enabled: ...
          # Feature details...

    section_name_2:
      enabled: ...
      layers:
        # Specify layers here...
      Feature1Name: # If feature has no namespace, then it is in the default namespace.
        enabled: ...
        # Feature details...

    section_name_3:
      enabled: ...
      layers:
        # Specify layers here...
      transformer_engine:
        Feature1Name:
          enabled: ...
          # Feature details...
        Feature2Name:
          enabled: ...
          # Feature details...

Sections may have any name and must contain:

1. An ``enabled`` field that specifies whether the features in that section will be active.
2. A ``layers`` field specifying which layers the section applies to. Each layer can belong to only one section.
3. Additional fields describing features for those layers.

Layer Specification
Paweł Gadziński's avatar
Paweł Gadziński committed
59
60
-------------------

61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

Debug layers can be identified by a ``name`` parameter:

.. code-block:: python

    linear = transformer_engine.debug.pytorch.Linear(in_features, out_features, name="linear1")

This name is used in the config file to identify the layer. To specify the ``layers`` field, you can use one of the following methods:

1. ``layer_name_regex_pattern``: Use a regular expression to match layer names. This expression must adhere to the Python ``re`` module syntax.
2. ``layer_types``: Provide a list of strings, where a layer will be selected if any string matches part of its name.

Examples:

.. code-block:: yaml

    # Example 1: Using regular expression to select layers
    my_section:
      enabled: ...
      layers:
        layer_name_regex_pattern: 'self_attn.*'
      transformer_engine:
        (...)

    # Example 2: Using layer type to select layers
    another_section:
      enabled: ...
      layers:
        layer_types: ['fc1', 'layernorm_linear']
      transformer_engine:
        (...)

Names in Transformer Layers
Paweł Gadziński's avatar
Paweł Gadziński committed
94
95
---------------------------

96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

There are three ways to assign a name to a layer in the Transformer Engine:

- Initialize the layer with the ``name=...`` argument.
- Use ``debug_api.infer_and_assign_layer_names(model)``, which assigns names based on class names.
- Rely on the default names assigned during module initialization, such as ``Layer_n``, where ``n`` represents the layer number.

The ``TransformerLayer`` in Transformer Engine is a composition of multiple sub-layers. We can modify some of these layers using precision debug tools, particularly those that contain exactly one linear layer. To see the names of all such layers, we can inspect log files. For instance, a ``TransformerLayer`` named ``transformer_layer`` might consist of:

- ``transformer_layer.self_attn.layernorm_linear_qkv`` / ``transformer_layer.self_attn.linear_qkv`` / ``transformer_layer.self_attn.layernorm_linear_q`` / ``transformer_layer.self_attn.linear_q`` / ``transformer_layer.self_attn.linear_kv``,
- ``transformer_layer.self_attn.proj``,
- ``transformer_layer.inter_attn.*`` for ``layer_type="decoder"``,
- ``transformer_layer.layernorm_mlp.fc1``,
- ``transformer_layer.layernorm_mlp.fc2``,

depending on the configuration. Some layers, like ``LayerNormLinear``, are fusions of two layers: ``LayerNorm`` and ``Linear``. When referring to such layers in precision debug tools, only the ``Linear`` part is affected.

113
114
For `GroupedLinear` layer, the names of underlying GEMMS are of the form `layer_name.gemm_n`, where `n` is the index of the GEMM.

115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
Below is an example ``TransformerLayer`` with four linear layers that can be influenced by the precision debug tools.

.. figure:: ./img/names.svg
   :align: center
   :width: 80%

   Fig 1: Names of layers in an example configuration of TransformerLayer. The most nested blocks represent the most basic layers, each containing one linear layer. Layers that do not contain linear layers, such as ``DotProductAttention``, are omitted.

**Configuration File Example**

.. code-block:: yaml

    # Disables wgrad in all 4 GEMMs
    section1:
      enabled: True
      layers:
        layer_types: [transformer_layer]
      transformer_engine:
        DisableFP8GEMM:
          enabled: True
          gemms: [wgrad]

    # Disables all GEMMs in layernorm_mlp layer
    section2:
      enabled: True
      layers:
        layer_types: [layernorm_mlp]
      transformer_engine:
        DisableFP8Layer:
          enabled: True
      
    # Logs wgrad stats in fc1
    section3:
      enabled: True
      layers:
        layer_types: [fc1]
      transformer_engine:
        LogTensorStats:
          enabled: True
          stats: [min]
          tensors: [wgrad]
          freq: 1
          start_step: 0
          end_step: 50


Structured Configuration for GEMMs and Tensors
Paweł Gadziński's avatar
Paweł Gadziński committed
162
----------------------------------------------
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223

Sometimes a feature is parameterized by a list of tensors or by a list of GEMMs.
There are multiple ways of describing this parameterization.

We can pass lists, as below.

.. code-block:: yaml

    Feature:
      enabled: ...
      gemms: [gemm1, gemm2]
      tensors: [tensor1, tensor2]
      ...

We can use struct for tensors.

.. code-block:: yaml

    Feature:
      gemms: [gemm1, gemm2]
      tensors_struct:
      - tensor: tensor1
        feature_param1: value
      - tensor: tensor2
        feature_param1: value
      gemm_feature_param1: value

Similarly, we can use struct for GEMMs.

.. code-block:: yaml

    Feature:
      enabled: ...
      tensors: [tensor1, tensor2]
      gemms_struct:
      - gemm: gemm1
        feature_param1: value
      - gemm: gemm2
        feature_param1: value
      gemm_feature_param1: value

We can use both structs for tensors and GEMMs. The tensors_struct should be nested inside gemms_struct.

.. code-block:: yaml

    Feature:
      enabled: ...
      gemms_struct:
        - gemm: gemm1
          tensors: [tensor1, tensor2]
          tensor_feature_param1: value
          gemm_feature_param1: value
        - gemm: gemm2
          tensors_struct:
          - tensor: tensor1
            tensor_feature_param1: value
          - tensor: tensor2
            tensor_feature_param2: value
          gemm_feature_param1: value

Enabling or Disabling Sections and Features
Paweł Gadziński's avatar
Paweł Gadziński committed
224
-------------------------------------------
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246

Debug features can be enabled or disabled with the ``enabled`` keyword:

.. code-block:: yaml

    section1:
      enabled: True
      layers:
        layer_types: [self_attention]
      transformer_engine:
        LogTensorStats:
          enabled: False # Disables the LogTensorStats feature
          stats: [max, min, mean, std, l1_norm]

    section2:
      enabled: False # Disables entire section2
      transformer_engine:
        LogFp8TensorStats:
          enabled: True # Does not enable the LogFp8TensorStats feature, because section2 is disabled
          stats: [underflows, overflows]

By organizing your ``config.yaml`` properly, you can easily manage debugging features, ensuring a more streamlined and customizable debugging experience.