This tool supports you to detect tensors on both CPU and GPU. However, there will always be some strange tensors on CPU, including the rng state of PyTorch.
## Example
An example is worth than a thousand words.
The code below defines a simple MLP module, with which we will show you how to use the tool.
```python
classMLP(nn.Module):
def__init__(self):
super().__init__()
self.mlp=nn.Sequential(nn.Linear(64,8),
nn.ReLU(),
nn.Linear(8,32))
defforward(self,x):
returnself.mlp(x)
```
And here is how to use the tool.
```python
fromcolossalai.utilsimportTensorDetector
# create random data
data=torch.rand(64,requires_grad=True).cuda()
data.retain_grad()
# create the module
model=MLP().cuda()
# create the detector
# by passing the model to the detector, it can distinguish module parameters from common tensors
I have made some comments on the right of the output for your understanding.
Note that the total `Mem` of all the tensors and parameters is not equal to `Total GPU Memery Allocated`. PyTorch's memory management is really complicated, and for models of a large scale, it's impossible to figure out clearly.
**The order of print is not equal to the order the tensor creates, but they are really close.**