In a typical PyTorch training loop, it might look like this:
In a typical PyTorch training loop, it might look like this:
```
```
ASP.prune_trained_model(model, optimizer)
ASP.prune_trained_model(model, optimizer)
...
@@ -34,4 +33,38 @@ The `prune_trained_model` calculates the sparse mask and applies it to the weigh
...
@@ -34,4 +33,38 @@ The `prune_trained_model` calculates the sparse mask and applies it to the weigh
ASP.compute_sparse_masks()
ASP.compute_sparse_masks()
```
```
A more thorough example can be found in `./test/toy_problem.py`.
A more thorough example can be found in `./test/toy_problem.py`.
\ No newline at end of file
The following approach serves as a guiding example on how to generate a pruned model that can use Sparse Tensor Core in NVIDIA Ampere Architecture. This approach generates a model for deployment, i.e. inference mode.
```
(1) Given a fully trained (dense) network, prune parameter values in 2:4 sparsepattern.
(2) Fine-tune the pruned model with optimization method and hyper-parameters (learning-rate, schedule, number of epochs, etc.) exactly as those used to obtain the trained model.
(3) (If required) Quantize the model.
```
In code, below is a sketch on how to use ASP for this approach (steps 1 and 2 above).