If the tensor is not processed using *modify_tensor* and the fp8 recipe is enabled,
If the tensor is not processed using *modify_tensor* and the fp8 recipe is enabled,
then the decision whether to cast it to fp8 is based on the value returned by the call *fp8_gemm_enabled*.
then the decision whether to cast it to fp8 is based on the value returned by the call *fp8_gemm_enabled*.
If the tensor is processed using *modify_tensor* or fp8 autocast is not enabled,
If the tensor is processed using *modify_tensor* or fp8 autocast is not enabled,
the result of this call does not matter.
the result of this call does not matter.
This method may return a tuple (bool, Optional[int]), where the int indicates the next iteration when the feature will be disabled.
It can return (bool, None) if the feature will never be enabled for that layer and gemm.
Returning the next enabled iteration can help optimize CPU usage.
Parameters
Parameters
----------
----------
...
@@ -122,9 +133,9 @@ class TEDefaultFeatures:
...
@@ -122,9 +133,9 @@ class TEDefaultFeatures:
Returns
Returns
-------
-------
bool - default is True
Union[bool, Tuple[bool, Optional[int]]] - default is (True, None)
"""
"""
returnTrue# if it is false, fp8_gemm will be turned off. Otherwise nothing happens.
returnTrue,None# if it is false, fp8_gemm will be turned off. Otherwise nothing happens.
defmodify_tensor_enabled(
defmodify_tensor_enabled(
self,
self,
...
@@ -133,9 +144,16 @@ class TEDefaultFeatures:
...
@@ -133,9 +144,16 @@ class TEDefaultFeatures:
gemm:str,
gemm:str,
tensor_name:str,
tensor_name:str,
iteration:int,
iteration:int,
)->bool:
)->bool|Tuple[bool,Optional[int]]:
"""
"""
It is used to determine whether *modify_tensor* will be run for a given GEMM and tensor name. It has **higher priority** than fp8_gemm, if *modify_tensor_enabled* returns True, then modify_tensor call is invoked for the respective tensor no matter what.
It is used to determine whether *modify_tensor* will be run for a given GEMM and tensor name.
It has **higher priority** than fp8_gemm; if *modify_tensor_enabled* returns True or (True, next_enabled_iter),
then modify_tensor call is invoked for the respective tensor no matter what.
This method may return a tuple (bool, Optional[int]), where the int indicates the next iteration when the feature will be enabled.
It can return (bool, None) if the feature will never be enabled for that layer, gemm and tensor.
Returning the next enabled iteration can help optimize CPU usage, especially when the interval between modify_tensor is large.
Returning only a bool is deprecated.
Parameters
Parameters
----------
----------
...
@@ -153,9 +171,9 @@ class TEDefaultFeatures:
...
@@ -153,9 +171,9 @@ class TEDefaultFeatures:
Returns
Returns
-------
-------
bool - default is False
Union[bool, Tuple[bool, Optional[int]]] - default is (False, None)
"""
"""
returnFalse
returnFalse,None
defmodify_tensor(
defmodify_tensor(
self,
self,
...
@@ -167,7 +185,7 @@ class TEDefaultFeatures:
...
@@ -167,7 +185,7 @@ class TEDefaultFeatures:
default_quantizer:Quantizer,
default_quantizer:Quantizer,
iteration:int,
iteration:int,
out:Union[torch.Tensor,QuantizedTensor],
out:Union[torch.Tensor,QuantizedTensor],
)->Union[torch.Tensor,QuantizedTensor,None]:
)->torch.Tensor|QuantizedTensor|None:
"""
"""
It allows tensor modification.
It allows tensor modification.
For example, feature `FakeQuant` uses it to emulate casting to FP8.
For example, feature `FakeQuant` uses it to emulate casting to FP8.
iteration number - equal to the number of times `debug_api.step()` was called.
iteration number - equal to the number of times `debug_api.step()` was called.
tp_group: torch.distributed.ProcessGroup
tp_group: torch.distributed.ProcessGroup
...
@@ -260,12 +287,15 @@ class TEDefaultFeatures:
...
@@ -260,12 +287,15 @@ class TEDefaultFeatures:
config:Dict,
config:Dict,
layer_name:str,
layer_name:str,
tensor_name:str,
tensor_name:str,
gemm:str,
tensor:torch.Tensor,
tensor:torch.Tensor,
iteration:int,
iteration:int,
tp_group:torch.distributed.ProcessGroup,
tp_group:torch.distributed.ProcessGroup,
rowwise:bool,
)->None:
)->None:
"""
"""
This is deprecated call, we advise to use *inspect_tensor* instead.
Similar to *inspect_tensor*, but is run after one of the: fp8 cast, modify_tensor if they are run. If none of the fp8 cast or modify_tensor is invoked, then *inspect_tensor_postquantize* is also not invoked. The feature LogFp8Stats uses this call to collect FP8 statistics after the quantization.
Similar to *inspect_tensor*, but is run after one of the: fp8 cast, modify_tensor if they are run. If none of the fp8 cast or modify_tensor is invoked, then *inspect_tensor_postquantize* is also not invoked. The feature LogFp8Stats uses this call to collect FP8 statistics after the quantization.
Parameters
Parameters
...
@@ -278,8 +308,6 @@ class TEDefaultFeatures:
...
@@ -278,8 +308,6 @@ class TEDefaultFeatures:
one of [`activation`, `weight`, `gradient`, `output`, `wgrad`, `dgrad`],
one of [`activation`, `weight`, `gradient`, `output`, `wgrad`, `dgrad`],
tensor: torch.Tensor
tensor: torch.Tensor
tensor in fp8 or processed tensor after the modify_tensor call,
tensor in fp8 or processed tensor after the modify_tensor call,
gemm: str
one of [`fprop`, `dgrad`, `wgrad`],
iteration: int
iteration: int
iteration number - equal to the number of times `debug_api.step()` was called.
iteration number - equal to the number of times `debug_api.step()` was called.
tp_group: torch.distributed.ProcessGroup
tp_group: torch.distributed.ProcessGroup
...
@@ -298,9 +326,15 @@ class TEDefaultFeatures:
...
@@ -298,9 +326,15 @@ class TEDefaultFeatures:
layer_name:str,
layer_name:str,
tensor_name:str,
tensor_name:str,
iteration:int,
iteration:int,
)->bool:
)->bool|Tuple[bool,Optional[int]]:
"""
"""
It is a routing call, which is run at the initialization of the layer. If it returns true, then *inspect_tensor* for a given GEMM and tensor will be invoked.
It is a routing call, which is run at the initialization of the layer.
Determines if *inspect_tensor* for a given GEMM and tensor will be invoked.
This method may return a tuple (bool, Optional[int]), where the int indicates the next iteration when the feature will be enabled.
It can return (bool, None) if the feature will never be enabled for that layer and tensor.
Returning the next enabled iteration can help optimize CPU usage, especially when the interval between inspect_tensor is large.
Returning only a bool is deprecated.
Parameters
Parameters
----------
----------
...
@@ -316,9 +350,9 @@ class TEDefaultFeatures:
...
@@ -316,9 +350,9 @@ class TEDefaultFeatures:
Returns
Returns
-------
-------
bool - default is False
Union[bool, Tuple[bool, Optional[int]]] - default is (False, None)
"""
"""
returnFalse
returnFalse,None
definspect_tensor_postquantize_enabled(
definspect_tensor_postquantize_enabled(
self,
self,
...
@@ -327,11 +361,18 @@ class TEDefaultFeatures:
...
@@ -327,11 +361,18 @@ class TEDefaultFeatures:
gemm:str,
gemm:str,
tensor_name:str,
tensor_name:str,
iteration:int,
iteration:int,
)->bool:
)->bool|Tuple[bool,Optional[int]]:
"""
"""
This is deprecated call, we advise to use *inspect_tensor* and *inspect_tensor_enabled* instead.
It is a routing call, which is run at the initialization of the layer.
It is a routing call, which is run at the initialization of the layer.
If it returns true, then *inspect_tensor_postquantize* for
Determines if *inspect_tensor_postquantize* for a given GEMM and tensor will be invoked.
a given GEMM and tensor will be invoked.
This method may return a tuple (bool, Optional[int]), where the int indicates the next iteration when the feature will be enabled.
It can return (bool, None) if the feature will never be enabled for that layer, gemm and tensor name.
Returning the next enabled iteration can help optimize CPU usage,
especially when the interval between inspect_tensor_postquantize is large.
Returning only a bool is deprecated.
Parameters
Parameters
----------
----------
...
@@ -349,9 +390,9 @@ class TEDefaultFeatures:
...
@@ -349,9 +390,9 @@ class TEDefaultFeatures:
Returns
Returns
-------
-------
bool - default is False
Union[bool, Tuple[bool, Optional[int]]] - default is (False, None)