Commit 82fd7a8b authored by yan.yan's avatar yan.yan
Browse files

v2.1.5: add profile tool and python 3.6 for linux

parent f31eee3a
...@@ -89,7 +89,7 @@ jobs: ...@@ -89,7 +89,7 @@ jobs:
runs-on: ubuntu-20.04 runs-on: ubuntu-20.04
strategy: strategy:
matrix: matrix:
python-version: ['3.7', '3.8', '3.9', '3.10'] # this version is only used for upload. python-version: ['3.6', '3.7', '3.8', '3.9', '3.10'] # this version is only used for upload.
cuda-version: ['102', '111', '113', '114', ''] cuda-version: ['102', '111', '113', '114', '']
steps: steps:
......
...@@ -14,5 +14,6 @@ jobs: ...@@ -14,5 +14,6 @@ jobs:
steps: steps:
- uses: actions/stale@v4 - uses: actions/stale@v4
with: with:
stale-issue-message: 'Close stale issues due to inactivity.' stale-issue-message: 'Mark stale issues due to inactivity.'
stale-pr-message: 'Close stale PRs due to inactivity.' stale-pr-message: 'Mark stale PRs due to inactivity.'
operations-per-run: 300
# Changelog # Changelog
## [2.1.5] - 2021-11-10
### Added
- Add cuda profile tool
- Add python 36 support
### Changed
- Format all code
### Removed
- remove a unnecessary device sync and slightly improve performance.
## [2.1.0] - 2021-10-31 ## [2.1.0] - 2021-10-31
### Addad ### Addad
* add implicit gemm algorithm for all kind of convolution with kernel volume <= 32. this algorithm is very fast with float16. * add implicit gemm algorithm for all kind of convolution with kernel volume <= 32. this algorithm is very fast with float16.
......
...@@ -13,16 +13,36 @@ ...@@ -13,16 +13,36 @@
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
[pypi-ver-cpu]: https://img.shields.io/pypi/v/spconv
[pypi-download]: https://img.shields.io/pypi/dm/spconv-cu114 [pypi-ver-114]: https://img.shields.io/pypi/v/spconv-cu114
[pypi-url]: https://pypi.org/project/spconv-cu114/ [pypi-ver-111]: https://img.shields.io/pypi/v/spconv-cu111
[pypi-image]: https://badge.fury.io/py/spconv-cu114.svg [pypi-ver-113]: https://img.shields.io/pypi/v/spconv-cu113
[pypi-ver-102]: https://img.shields.io/pypi/v/spconv-cu102
[pypi-url-111]: https://pypi.org/project/spconv-cu111/
[pypi-download-111]: https://img.shields.io/pypi/dm/spconv-cu111
[pypi-url-113]: https://pypi.org/project/spconv-cu113/
[pypi-download-113]: https://img.shields.io/pypi/dm/spconv-cu113
[pypi-url-102]: https://pypi.org/project/spconv-cu102/
[pypi-download-102]: https://img.shields.io/pypi/dm/spconv-cu102
[pypi-url-114]: https://pypi.org/project/spconv-cu114/
[pypi-download-114]: https://img.shields.io/pypi/dm/spconv-cu114
[pypi-url-cpu]: https://pypi.org/project/spconv/
[pypi-download-cpu]: https://img.shields.io/pypi/dm/spconv
# SpConv: Spatially Sparse Convolution Library # SpConv: Spatially Sparse Convolution Library
[![Build Status](https://github.com/traveller59/spconv/workflows/build/badge.svg)](https://github.com/traveller59/spconv/actions?query=workflow%3Abuild) [![PyPI Version][pypi-image]][pypi-url] [![pypi monthly download][pypi-download]][pypi-url] [![Build Status](https://github.com/traveller59/spconv/workflows/build/badge.svg)](https://github.com/traveller59/spconv/actions?query=workflow%3Abuild)
| | PyPi Version | Downloads |
| -------------- |:---------------------:| ---------------------:|
| CPU (Linux Only) | [![PyPI Version][pypi-ver-cpu]][pypi-url-cpu] | [![pypi monthly download][pypi-download-cpu]][pypi-url-cpu] |
| CUDA 10.2 | [![PyPI Version][pypi-ver-102]][pypi-url-102] | [![pypi monthly download][pypi-download-102]][pypi-url-102] |
| CUDA 11.1 | [![PyPI Version][pypi-ver-111]][pypi-url-111] | [![pypi monthly download][pypi-download-111]][pypi-url-111]|
| CUDA 11.3 (Linux Only) | [![PyPI Version][pypi-ver-113]][pypi-url-113] |[![pypi monthly download][pypi-download-113]][pypi-url-113]|
| CUDA 11.4 | [![PyPI Version][pypi-ver-114]][pypi-url-114] | [![pypi monthly download][pypi-download-114]][pypi-url-114]|
```spconv``` is a project that provide heavily-optimized sparse convolution implementation with tensor core support. ```spconv``` is a project that provide heavily-optimized sparse convolution implementation with tensor core support. check [benchmark](docs/BENCHMARK.md) to see how fast spconv 2.x runs.
[Spconv 1.x code](https://github.com/traveller59/spconv/tree/v1.2.1). We won't provide any support for spconv 1.x since it's deprecated. use spconv 2.x if possible. <!--remove this message in spconv 2.2--> [Spconv 1.x code](https://github.com/traveller59/spconv/tree/v1.2.1). We won't provide any support for spconv 1.x since it's deprecated. use spconv 2.x if possible. <!--remove this message in spconv 2.2-->
...@@ -99,7 +119,10 @@ The c++ code will be built automatically when you change c++ code in project. ...@@ -99,7 +119,10 @@ The c++ code will be built automatically when you change c++ code in project.
For NVIDIA Embedded Platforms, you need to specify cuda arch before build: ```export CUMM_CUDA_ARCH_LIST="7.2"``` for xavier. For NVIDIA Embedded Platforms, you need to specify cuda arch before build: ```export CUMM_CUDA_ARCH_LIST="7.2"``` for xavier.
You need to remove ```cumm``` in ```requires``` section in pyproject.toml after install editable ```cumm``` and before install spconv due to pyproject limit (can't find editable installed ```cumm```).
#### Linux #### Linux
0. uninstall spconv and cumm installed by pip 0. uninstall spconv and cumm installed by pip
1. install build-essential, install CUDA 1. install build-essential, install CUDA
2. ```git clone https://github.com/FindDefinition/cumm```, ```cd ./cumm```, ```pip install -e .``` 2. ```git clone https://github.com/FindDefinition/cumm```, ```cd ./cumm```, ```pip install -e .```
......
<!--
Copyright 2021 Yan Yan
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
## Simple Benchmark
### Network Benchmark without batchnorm (F32/F16) in RTX 3080 Laptop GPU
Network Code: test/benchmark.py
| F32/F16 | Spconv 1.x F32 (1080Ti) | Native| Implicit Gemm | Implicit Gemm Split Mask |
| -------------- |:---------------------:|---------------------:|---------------------:| ---------------------:|
| Forward | 43ms | 21.7ms/13.7ms | 23.5ms/11.2ms | 22ms/12.2ms |
| Backward | 80ms | 41.9ms/25.2ms | 51.0ms/13.8ms | 41.1ms/12.2ms |
### Network Gemm Kernel Benchmark FP16 in RTX 3080 Laptop GPU
Network Code: test/benchmark.py
The network/input/profile code is same as above table.
This table only profile **fp16 gemm kernels** without output tensor create/clear overhead. this table show the performance upper bound of our algorithm.
| F16 | Native| Implicit Gemm | Implicit Gemm Split Mask |
| -------------- |:---------------------:|---------------------:| ---------------------:|
| Forward | 8.0ms | 4.3ms | 4.0ms |
We can see that the implicit gemm is very fast, gemm only use 4.3ms/11.2ms in network forward. we can achieve better performance in TensorRT + Pure C++.
**NOTE**
When you want to benchmark network in your laptop, don't forget to close all apps except terminals! Other apps will consume GPU resource and make kernels run slower.
## Comparsion with [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) and [torchsparse](https://github.com/mit-han-lab/torchsparse)
TODO
\ No newline at end of file
...@@ -25,12 +25,7 @@ ...@@ -25,12 +25,7 @@
* make sure your channel size is multiple of 8 when using fp16. multiple of 32 is better. * make sure your channel size is multiple of 8 when using fp16. multiple of 32 is better.
* spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible. * spconv 2.x in Windows 10 is 1.5x~2x slower than Linux. use Linux if possible.
Network Benchmark without batchnorm (F32/F16) in RTX 3080 Laptop GPU See [benchmark](BENCHMARK.md) for more performance details of different algorithms.
| F32/F16 | Spconv 1.x | Native| Implicit Gemm | Implicit Gemm Split Mask |
| -------------- |:---------------------:|---------------------:|---------------------:| ---------------------:|
| Forward | 43ms | 29ms/23ms | 30ms/15ms | 30ms/19ms |
| Backward | 80ms | 47ms/32ms | 56ms/15ms | 45ms/14ms |
## Algorithm Overview ## Algorithm Overview
...@@ -57,4 +52,4 @@ In my test, ```Implicit Gemm``` is almost 2x faster than ```Native```. ...@@ -57,4 +52,4 @@ In my test, ```Implicit Gemm``` is almost 2x faster than ```Native```.
TODO TODO
In my test, ```Implicit Gemm Split Mask``` is slightly faster than ```Implicit Gemm```, but the indice generation is greatly slower, so currently we use ```Implicit Gemm``` by default. In my test, ```Implicit Gemm Split Mask``` is slightly faster than ```Implicit Gemm```, but the indice generation is slower, so currently we use ```Implicit Gemm``` by default.
\ No newline at end of file \ No newline at end of file
# Copyright 2021 Yan Yan # Copyright 2021 Yan Yan
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -22,11 +22,12 @@ import torch.optim as optim ...@@ -22,11 +22,12 @@ import torch.optim as optim
from torchvision import datasets, transforms from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR from torch.optim.lr_scheduler import StepLR
import contextlib import contextlib
import torch.cuda.amp import torch.cuda.amp
@contextlib.contextmanager @contextlib.contextmanager
def identity_ctx(): def identity_ctx():
yield yield
class Net(nn.Module): class Net(nn.Module):
...@@ -39,14 +40,13 @@ class Net(nn.Module): ...@@ -39,14 +40,13 @@ class Net(nn.Module):
spconv.SubMConv2d(32, 64, 3, 1), spconv.SubMConv2d(32, 64, 3, 1),
nn.ReLU(), nn.ReLU(),
spconv.SparseMaxPool2d(2, 2), spconv.SparseMaxPool2d(2, 2),
spconv.ToDense(), spconv.ToDense(),
) )
self.fc1 = nn.Linear(14 * 14 * 64, 128) self.fc1 = nn.Linear(14 * 14 * 64, 128)
self.fc2 = nn.Linear(128, 10) self.fc2 = nn.Linear(128, 10)
self.dropout1 = nn.Dropout2d(0.25) self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5) self.dropout2 = nn.Dropout2d(0.5)
def forward(self, x: torch.Tensor): def forward(self, x: torch.Tensor):
# x: [N, 28, 28, 1], must be NHWC tensor # x: [N, 28, 28, 1], must be NHWC tensor
x_sp = spconv.SparseConvTensor.from_dense(x.reshape(-1, 28, 28, 1)) x_sp = spconv.SparseConvTensor.from_dense(x.reshape(-1, 28, 28, 1))
...@@ -116,40 +116,72 @@ def test(args, model, device, test_loader): ...@@ -116,40 +116,72 @@ def test(args, model, device, test_loader):
with amp_ctx: with amp_ctx:
output = model(data) output = model(data)
test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss test_loss += F.nll_loss(
pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability output, target, reduction='sum').item() # sum up batch loss
pred = output.argmax(
dim=1,
keepdim=True) # get the index of the max log-probability
correct += pred.eq(target.view_as(pred)).sum().item() correct += pred.eq(target.view_as(pred)).sum().item()
test_loss /= len(test_loader.dataset) test_loss /= len(test_loader.dataset)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format( print(
test_loss, correct, len(test_loader.dataset), '\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
100. * correct / len(test_loader.dataset))) test_loss, correct, len(test_loader.dataset),
100. * correct / len(test_loader.dataset)))
def main(): def main():
# Training settings # Training settings
parser = argparse.ArgumentParser(description='PyTorch MNIST Example') parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
parser.add_argument('--batch-size', type=int, default=64, metavar='N', parser.add_argument('--batch-size',
type=int,
default=64,
metavar='N',
help='input batch size for training (default: 64)') help='input batch size for training (default: 64)')
parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N', parser.add_argument('--test-batch-size',
type=int,
default=1000,
metavar='N',
help='input batch size for testing (default: 1000)') help='input batch size for testing (default: 1000)')
parser.add_argument('--epochs', type=int, default=14, metavar='N', parser.add_argument('--epochs',
type=int,
default=14,
metavar='N',
help='number of epochs to train (default: 14)') help='number of epochs to train (default: 14)')
parser.add_argument('--lr', type=float, default=1.0, metavar='LR', parser.add_argument('--lr',
type=float,
default=1.0,
metavar='LR',
help='learning rate (default: 1.0)') help='learning rate (default: 1.0)')
parser.add_argument('--gamma', type=float, default=0.7, metavar='M', parser.add_argument('--gamma',
type=float,
default=0.7,
metavar='M',
help='Learning rate step gamma (default: 0.7)') help='Learning rate step gamma (default: 0.7)')
parser.add_argument('--no-cuda', action='store_true', default=False, parser.add_argument('--no-cuda',
action='store_true',
default=False,
help='disables CUDA training') help='disables CUDA training')
parser.add_argument('--seed', type=int, default=1, metavar='S', parser.add_argument('--seed',
type=int,
default=1,
metavar='S',
help='random seed (default: 1)') help='random seed (default: 1)')
parser.add_argument('--log-interval', type=int, default=10, metavar='N', parser.add_argument(
help='how many batches to wait before logging training status') '--log-interval',
type=int,
parser.add_argument('--save-model', action='store_true', default=False, default=10,
metavar='N',
help='how many batches to wait before logging training status')
parser.add_argument('--save-model',
action='store_true',
default=False,
help='For Saving the current Model') help='For Saving the current Model')
parser.add_argument('--fp16', action='store_true', default=False, parser.add_argument('--fp16',
action='store_true',
default=False,
help='For mixed precision training') help='For mixed precision training')
args = parser.parse_args() args = parser.parse_args()
...@@ -161,20 +193,30 @@ def main(): ...@@ -161,20 +193,30 @@ def main():
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {} kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
train_loader = torch.utils.data.DataLoader( train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True, datasets.MNIST(
transform=transforms.Compose([ '../data',
transforms.ToTensor(), train=True,
# here we remove norm to get sparse tensor with lots of zeros download=True,
# transforms.Normalize((0.1307,), (0.3081,)) transform=transforms.Compose([
])), transforms.ToTensor(),
batch_size=args.batch_size, shuffle=True, **kwargs) # here we remove norm to get sparse tensor with lots of zeros
# transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size,
shuffle=True,
**kwargs)
test_loader = torch.utils.data.DataLoader( test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, transform=transforms.Compose([ datasets.MNIST(
transforms.ToTensor(), '../data',
# here we remove norm to get sparse tensor with lots of zeros train=False,
# transforms.Normalize((0.1307,), (0.3081,)) transform=transforms.Compose([
])), transforms.ToTensor(),
batch_size=args.test_batch_size, shuffle=True, **kwargs) # here we remove norm to get sparse tensor with lots of zeros
# transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.test_batch_size,
shuffle=True,
**kwargs)
model = Net().to(device) model = Net().to(device)
optimizer = optim.Adadelta(model.parameters(), lr=args.lr) optimizer = optim.Adadelta(model.parameters(), lr=args.lr)
......
# Copyright 2021 Yan Yan # Copyright 2021 Yan Yan
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import numpy as np import numpy as np
from cumm import tensorview as tv from cumm import tensorview as tv
from spconv.utils import Point2VoxelCPU3d from spconv.utils import Point2VoxelCPU3d
from spconv.pytorch.utils import PointToVoxel from spconv.pytorch.utils import PointToVoxel
import torch import torch
def main(): def main():
# voxel gen source code: spconv/csrc/sparse/pointops.py # voxel gen source code: spconv/csrc/sparse/pointops.py
gen = Point2VoxelCPU3d( gen = Point2VoxelCPU3d(vsize_xyz=[0.1, 0.1, 0.1],
vsize_xyz=[0.1, 0.1, 0.1], coors_range_xyz=[-80, -80, -2, 80, 80, 6],
coors_range_xyz=[-80, -80, -2, 80, 80, 6], num_point_features=3,
num_point_features=3, max_num_voxels=5000,
max_num_voxels=5000, max_num_points_per_voxel=5)
max_num_points_per_voxel=5)
pc = np.random.uniform(-10, 10, size=[1000, 3]) pc = np.random.uniform(-10, 10, size=[1000, 3])
pc_tv = tv.from_numpy(pc) pc_tv = tv.from_numpy(pc)
...@@ -39,20 +39,23 @@ def main(): ...@@ -39,20 +39,23 @@ def main():
print("------Raw Voxels-------") print("------Raw Voxels-------")
print(voxels_np[0]) print(voxels_np[0])
# run voxel gen and FILL MEAN VALUE to voxel remain # run voxel gen and FILL MEAN VALUE to voxel remain
voxels_tv, indices_tv, num_p_in_vx_tv = gen.point_to_voxel_empty_mean(pc_tv) voxels_tv, indices_tv, num_p_in_vx_tv = gen.point_to_voxel_empty_mean(
pc_tv)
voxels_np = voxels_tv.numpy_view() voxels_np = voxels_tv.numpy_view()
indices_np = indices_tv.numpy_view() indices_np = indices_tv.numpy_view()
num_p_in_vx_np = num_p_in_vx_tv.numpy_view() num_p_in_vx_np = num_p_in_vx_tv.numpy_view()
print("------Voxels with mean filled-------") print("------Voxels with mean filled-------")
print(voxels_np[0]) print(voxels_np[0])
def main_point_with_features(): def main_point_with_features():
# voxel gen source code: spconv/csrc/sparse/pointops.py # voxel gen source code: spconv/csrc/sparse/pointops.py
gen = Point2VoxelCPU3d( gen = Point2VoxelCPU3d(
vsize_xyz=[0.1, 0.1, 0.1], vsize_xyz=[0.1, 0.1, 0.1],
coors_range_xyz=[-80, -80, -2, 80, 80, 6], coors_range_xyz=[-80, -80, -2, 80, 80, 6],
num_point_features=4, # here num_point_features must equal to pc.shape[1] num_point_features=
max_num_voxels=5000, 4, # here num_point_features must equal to pc.shape[1]
max_num_voxels=5000,
max_num_points_per_voxel=5) max_num_points_per_voxel=5)
pc = np.random.uniform(-10, 10, size=[1000, 3]) pc = np.random.uniform(-10, 10, size=[1000, 3])
...@@ -68,21 +71,22 @@ def main_point_with_features(): ...@@ -68,21 +71,22 @@ def main_point_with_features():
print("------Raw Voxels-------") print("------Raw Voxels-------")
print(voxels_np[0]) print(voxels_np[0])
# run voxel gen and FILL MEAN VALUE to voxel remain # run voxel gen and FILL MEAN VALUE to voxel remain
voxels_tv, indices_tv, num_p_in_vx_tv = gen.point_to_voxel_empty_mean(pc_tv) voxels_tv, indices_tv, num_p_in_vx_tv = gen.point_to_voxel_empty_mean(
pc_tv)
voxels_np = voxels_tv.numpy_view() voxels_np = voxels_tv.numpy_view()
indices_np = indices_tv.numpy_view() indices_np = indices_tv.numpy_view()
num_p_in_vx_np = num_p_in_vx_tv.numpy_view() num_p_in_vx_np = num_p_in_vx_tv.numpy_view()
print("------Voxels with mean filled-------") print("------Voxels with mean filled-------")
print(voxels_np[0]) print(voxels_np[0])
def main_pytorch_voxel_gen(): def main_pytorch_voxel_gen():
# voxel gen source code: spconv/csrc/sparse/pointops.py # voxel gen source code: spconv/csrc/sparse/pointops.py
gen = PointToVoxel( gen = PointToVoxel(vsize_xyz=[0.1, 0.1, 0.1],
vsize_xyz=[0.1, 0.1, 0.1], coors_range_xyz=[-80, -80, -2, 80, 80, 6],
coors_range_xyz=[-80, -80, -2, 80, 80, 6], num_point_features=3,
num_point_features=3, max_num_voxels=5000,
max_num_voxels=5000, max_num_points_per_voxel=5)
max_num_points_per_voxel=5)
pc = np.random.uniform(-10, 10, size=[1000, 3]) pc = np.random.uniform(-10, 10, size=[1000, 3])
pc_th = torch.from_numpy(pc) pc_th = torch.from_numpy(pc)
...@@ -100,16 +104,16 @@ def main_pytorch_voxel_gen(): ...@@ -100,16 +104,16 @@ def main_pytorch_voxel_gen():
print("------Voxels with mean filled-------") print("------Voxels with mean filled-------")
print(voxels_np[0]) print(voxels_np[0])
def main_pytorch_voxel_gen_cuda(): def main_pytorch_voxel_gen_cuda():
# voxel gen source code: spconv/csrc/sparse/pointops.py # voxel gen source code: spconv/csrc/sparse/pointops.py
device = torch.device("cuda:0") device = torch.device("cuda:0")
gen = PointToVoxel( gen = PointToVoxel(vsize_xyz=[0.1, 0.1, 0.1],
vsize_xyz=[0.1, 0.1, 0.1], coors_range_xyz=[-80, -80, -2, 80, 80, 6],
coors_range_xyz=[-80, -80, -2, 80, 80, 6], num_point_features=3,
num_point_features=3, max_num_voxels=5000,
max_num_voxels=5000, max_num_points_per_voxel=5,
max_num_points_per_voxel=5, device=device)
device=device)
pc = np.random.uniform(-10, 10, size=[1000, 3]).astype(np.float32) pc = np.random.uniform(-10, 10, size=[1000, 3]).astype(np.float32)
pc_th = torch.from_numpy(pc).to(device) pc_th = torch.from_numpy(pc).to(device)
...@@ -133,4 +137,4 @@ if __name__ == "__main__": ...@@ -133,4 +137,4 @@ if __name__ == "__main__":
main_point_with_features() main_point_with_features()
main_pytorch_voxel_gen() main_pytorch_voxel_gen()
if torch.cuda.is_available(): if torch.cuda.is_available():
main_pytorch_voxel_gen_cuda() main_pytorch_voxel_gen_cuda()
\ No newline at end of file
isort -rc --atomic ./spconv && \ yapf -i --recursive -vv ./spconv ./test ./example ./scripts
isort -rc --atomic ./test && \
yapf -i --recursive -vv ./spconv ./test
find ./src -regex '.*\.\(cpp\|hpp\|cc\|cxx\|cu\|cuh\|h\)' | xargs clang-format -i
find ./include -regex '.*\.\(cpp\|hpp\|cc\|cxx\|cu\|cuh\|h\)' | xargs clang-format -i
\ No newline at end of file
[build-system] [build-system]
requires = ["setuptools>=41.0", "wheel", "pccm>=0.2.21", "cumm>=0.2.1"] requires = ["setuptools>=41.0", "wheel", "pccm>=0.2.21", "cumm>=0.2.2"]
build-backend = "setuptools.build_meta" build-backend = "setuptools.build_meta"
...@@ -19,20 +19,21 @@ from cumm.conv.bases import NCHW, NHWC, ConvIterAlgo, ConvOpType ...@@ -19,20 +19,21 @@ from cumm.conv.bases import NCHW, NHWC, ConvIterAlgo, ConvOpType
from cumm.conv.main import ConvMainUnitTest, gen_gemm_kernels from cumm.conv.main import ConvMainUnitTest, gen_gemm_kernels
from cumm.conv.params import ConvProblem from cumm.conv.params import ConvProblem
from cumm.gemm import kernel from cumm.gemm import kernel
import os import os
from spconv.core_cc.csrc.sparse.all import SpconvOps from spconv.core_cc.csrc.sparse.all import SpconvOps
from cumm.gemm.codeops import div_up from cumm.gemm.codeops import div_up
from spconv.constants import PACKAGE_ROOT from spconv.constants import PACKAGE_ROOT
from spconv.core import ConvAlgo from spconv.core import ConvAlgo
from spconv.pytorch import ops from spconv.pytorch import ops
from spconv.algo import CONV, BestConvAlgoByProfile from spconv.algo import CONV, BestConvAlgoByProfile
from spconv.pytorch.cppcore import torch_tensor_to_tv from spconv.pytorch.cppcore import torch_tensor_to_tv
def reduce_mask_count(mask: np.ndarray, width: int): def reduce_mask_count(mask: np.ndarray, width: int):
mask_length_32 = (div_up(mask.shape[0], width)) * width mask_length_32 = (div_up(mask.shape[0], width)) * width
if mask.shape[0] < mask_length_32: if mask.shape[0] < mask_length_32:
mask_pad = np.zeros((mask_length_32,), dtype=mask.dtype) mask_pad = np.zeros((mask_length_32, ), dtype=mask.dtype)
mask_pad[:mask.shape[0]] = mask mask_pad[:mask.shape[0]] = mask
mask = mask_pad mask = mask_pad
mask = mask.reshape(-1, width) mask = mask.reshape(-1, width)
...@@ -40,16 +41,18 @@ def reduce_mask_count(mask: np.ndarray, width: int): ...@@ -40,16 +41,18 @@ def reduce_mask_count(mask: np.ndarray, width: int):
maskr_tv = tv.from_numpy(maskr) maskr_tv = tv.from_numpy(maskr)
return SpconvOps.count_bits(maskr_tv).numpy().sum() * width return SpconvOps.count_bits(maskr_tv).numpy().sum() * width
def reduce_mask_count_x(mask: np.ndarray, width: int): def reduce_mask_count_x(mask: np.ndarray, width: int):
mask_length_32 = (div_up(mask.shape[0], width)) * width mask_length_32 = (div_up(mask.shape[0], width)) * width
if mask.shape[0] < mask_length_32: if mask.shape[0] < mask_length_32:
mask_pad = np.zeros((mask_length_32,), dtype=mask.dtype) mask_pad = np.zeros((mask_length_32, ), dtype=mask.dtype)
mask_pad[:mask.shape[0]] = mask mask_pad[:mask.shape[0]] = mask
mask = mask_pad mask = mask_pad
mask = mask.reshape(-1, width) mask = mask.reshape(-1, width)
maskr = np.bitwise_or.reduce(mask, axis=1) maskr = np.bitwise_or.reduce(mask, axis=1)
return maskr return maskr
def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
limit_input_n = 16384 limit_input_n = 16384
limit_input_n = None limit_input_n = None
...@@ -88,8 +91,9 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -88,8 +91,9 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
stride = [1] * ndim stride = [1] * ndim
dilation = [1] * ndim dilation = [1] * ndim
out_padding = [0] * ndim out_padding = [0] * ndim
out_inds, pair_ref, indice_num_per_loc = ops.get_indice_pairs(indices_th, 1, spatial_shape, ConvAlgo.Native, out_inds, pair_ref, indice_num_per_loc = ops.get_indice_pairs(
ksize, stride, padding, dilation, out_padding, subm) indices_th, 1, spatial_shape, ConvAlgo.Native, ksize, stride, padding,
dilation, out_padding, subm)
indice_num_per_loc_np = indice_num_per_loc.cpu().numpy() indice_num_per_loc_np = indice_num_per_loc.cpu().numpy()
indice_pairs_np = pair_ref.cpu().numpy() indice_pairs_np = pair_ref.cpu().numpy()
algo = ConvAlgo.MaskSplitImplicitGemm algo = ConvAlgo.MaskSplitImplicitGemm
...@@ -98,8 +102,9 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -98,8 +102,9 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
else: else:
num_split = 2 num_split = 2
for i in range(5): for i in range(5):
res = ops.get_indice_pairs_implicit_gemm(indices_th, 1, spatial_shape, algo, res = ops.get_indice_pairs_implicit_gemm(indices_th, 1, spatial_shape,
ksize, stride, padding, dilation, out_padding, subm) algo, ksize, stride, padding,
dilation, out_padding, subm)
out_inds = res[0] out_inds = res[0]
num_inds_per_loc = res[1] num_inds_per_loc = res[1]
pair_fwd = res[2] pair_fwd = res[2]
...@@ -115,23 +120,38 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -115,23 +120,38 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
mask_argsort_fwd_splits = res[6] mask_argsort_fwd_splits = res[6]
mask_argsort_bwd_splits = res[7] mask_argsort_bwd_splits = res[7]
masks = res[8] masks = res[8]
pair_mask_fwd_splits_tv = [ops.torch_tensor_to_tv(t, dtype=tv.uint32) for t in pair_mask_fwd_splits] pair_mask_fwd_splits_tv = [
valid_location_bitcount = [SpconvOps.count_bits(t) for t in pair_mask_fwd_splits_tv] ops.torch_tensor_to_tv(t, dtype=tv.uint32)
valid_location_count = sum([t.cpu().numpy().sum() for t in valid_location_bitcount]) for t in pair_mask_fwd_splits
]
valid_location_bitcount = [
SpconvOps.count_bits(t) for t in pair_mask_fwd_splits_tv
]
valid_location_count = sum(
[t.cpu().numpy().sum() for t in valid_location_bitcount])
reduce_length = 32 reduce_length = 32
split_mask_valid_count = sum([reduce_mask_count(t.cpu().numpy(), reduce_length) for t in pair_mask_fwd_splits_tv]) split_mask_valid_count = sum([
reduce_mask_count(t.cpu().numpy(), reduce_length)
for t in pair_mask_fwd_splits_tv
])
if subm: if subm:
print("SUBM", valid_location_count, split_mask_valid_count, pair_fwd.numel()) print("SUBM", valid_location_count, split_mask_valid_count,
pair_fwd.numel())
else: else:
print("REGULAR", valid_location_count, split_mask_valid_count, pair_fwd.numel()) print("REGULAR", valid_location_count, split_mask_valid_count,
# return pair_fwd.numel())
# return
if run_conv: if run_conv:
C = 64 C = 64
K = 64 K = 64
desps = CONV.desps desps = CONV.desps
mask_output_fwd = torch.zeros([2, div_up(out_inds.shape[0], 32)], dtype=torch.int32, device=indices_th.device) mask_output_fwd = torch.zeros([2, div_up(out_inds.shape[0], 32)],
mask_output_bwd = torch.zeros([2, div_up(indices.dim(0), 32)], dtype=torch.int32, device=indices_th.device) dtype=torch.int32,
device=indices_th.device)
mask_output_bwd = torch.zeros([2, div_up(indices.dim(0), 32)],
dtype=torch.int32,
device=indices_th.device)
for desp in desps: for desp in desps:
if desp.algo != GemmAlgo.Simt.value: if desp.algo != GemmAlgo.Simt.value:
...@@ -140,17 +160,22 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -140,17 +160,22 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
# continue # continue
# if desp.tile_shape ! # if desp.tile_shape !
if desp.dtype_a == dtypes.int8.tv_dtype: if desp.dtype_a == dtypes.int8.tv_dtype:
inp = np.random.randint(-1, 1, size=[voxels_np.shape[0], C]).astype(np.int8) inp = np.random.randint(-1, 1, size=[voxels_np.shape[0],
weight = np.random.randint(-1, 1, size=[K, *ksize, C]).astype(np.int8) C]).astype(np.int8)
output = np.random.randint(-1, 1, size=[out_inds.shape[0], K]).astype( weight = np.random.randint(-1, 1, size=[K, *ksize,
dtypes.get_npdtype_from_tvdtype(desp.dtype_output)) C]).astype(np.int8)
output = np.random.randint(-1, 1, size=[
out_inds.shape[0], K
]).astype(dtypes.get_npdtype_from_tvdtype(desp.dtype_output))
else: else:
inp = np.random.uniform(-1, 1, size=[voxels_np.shape[0], C]).astype( inp = np.random.uniform(-1, 1, size=[
dtypes.get_npdtype_from_tvdtype(desp.dtype_input)) voxels_np.shape[0], C
]).astype(dtypes.get_npdtype_from_tvdtype(desp.dtype_input))
weight = np.random.uniform(-1, 1, size=[K, *ksize, C]).astype( weight = np.random.uniform(-1, 1, size=[K, *ksize, C]).astype(
dtypes.get_npdtype_from_tvdtype(desp.dtype_weight)) dtypes.get_npdtype_from_tvdtype(desp.dtype_weight))
output = np.random.uniform(-1, 1, size=[out_inds.shape[0], K]).astype( output = np.random.uniform(-1, 1, size=[
dtypes.get_npdtype_from_tvdtype(desp.dtype_output)) out_inds.shape[0], K
]).astype(dtypes.get_npdtype_from_tvdtype(desp.dtype_output))
weight_ref = weight.transpose(1, 2, 3, 0, 4) weight_ref = weight.transpose(1, 2, 3, 0, 4)
weight_ref = np.ascontiguousarray(weight_ref).reshape(-1, K, C) weight_ref = np.ascontiguousarray(weight_ref).reshape(-1, K, C)
if desp.op_type == ConvOpType.kBackwardInput.value: if desp.op_type == ConvOpType.kBackwardInput.value:
...@@ -211,19 +236,19 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -211,19 +236,19 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
) )
else: else:
if desp.op_type == ConvOpType.kForward.value: if desp.op_type == ConvOpType.kForward.value:
indice_pairs = pair_fwd # inp -> out indice_pairs = pair_fwd # inp -> out
mask_ops = pair_mask_fwd_splits mask_ops = pair_mask_fwd_splits
mask_argsorts = mask_argsort_fwd_splits mask_argsorts = mask_argsort_fwd_splits
mask_output = mask_output_fwd mask_output = mask_output_fwd
elif desp.op_type == ConvOpType.kBackwardInput.value: elif desp.op_type == ConvOpType.kBackwardInput.value:
indice_pairs = pair_bwd # out -> inp indice_pairs = pair_bwd # out -> inp
mask_ops = pair_mask_bwd_splits mask_ops = pair_mask_bwd_splits
mask_argsorts = mask_argsort_bwd_splits mask_argsorts = mask_argsort_bwd_splits
mask_output = mask_output_bwd mask_output = mask_output_bwd
print([bin(x.item()) for x in masks]) print([bin(x.item()) for x in masks])
else: else:
indice_pairs = pair_fwd # inp -> out indice_pairs = pair_fwd # inp -> out
mask_ops = pair_mask_fwd_splits mask_ops = pair_mask_fwd_splits
mask_argsorts = mask_argsort_fwd_splits mask_argsorts = mask_argsort_fwd_splits
mask_output = mask_output_fwd mask_output = mask_output_fwd
...@@ -255,7 +280,7 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -255,7 +280,7 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
) )
torch.cuda.synchronize() torch.cuda.synchronize()
duration = time.time() - t duration = time.time() - t
if desp.op_type == ConvOpType.kForward.value: if desp.op_type == ConvOpType.kForward.value:
output_ref = np.zeros_like(output, dtype=np.float32) output_ref = np.zeros_like(output, dtype=np.float32)
# ref algorithm # ref algorithm
...@@ -270,7 +295,9 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -270,7 +295,9 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
c_inds = indice_pairs_np[1][filter_offset][:nhot] c_inds = indice_pairs_np[1][filter_offset][:nhot]
# print(a_inds_cpu[:10]) # print(a_inds_cpu[:10])
a = inp[a_inds] a = inp[a_inds]
cc = a.astype(np.float32) @ weight_ref[filter_offset].T.astype(np.float32) cc = a.astype(
np.float32) @ weight_ref[filter_offset].T.astype(
np.float32)
output_ref[c_inds] += cc output_ref[c_inds] += cc
output_cpu = output_tv.cpu().numpy().astype(np.float32) output_cpu = output_tv.cpu().numpy().astype(np.float32)
...@@ -294,12 +321,18 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -294,12 +321,18 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
# print(a_inds_cpu[:10]) # print(a_inds_cpu[:10])
a = output[a_inds] a = output[a_inds]
# NK @ KC # NK @ KC
cc = a.astype(np.float32) @ weight_ref[filter_offset].astype(np.float32) cc = a.astype(
np.float32) @ weight_ref[filter_offset].astype(
np.float32)
dinput_ref[c_inds] += cc dinput_ref[c_inds] += cc
din_cpu = inp_tv.cpu().numpy() din_cpu = inp_tv.cpu().numpy()
print("ERROR", np.linalg.norm(din_cpu.reshape(-1) - dinput_ref.reshape(-1))) print(
"ERROR",
np.linalg.norm(
din_cpu.reshape(-1) - dinput_ref.reshape(-1)))
else: else:
dw_ref = np.zeros_like(weight_ref, dtype=np.float32) # KV, K, C dw_ref = np.zeros_like(weight_ref,
dtype=np.float32) # KV, K, C
for filter_offset in range(kv): for filter_offset in range(kv):
if subm and filter_offset > kv // 2: if subm and filter_offset > kv // 2:
nhot = indice_num_per_loc_np[kv - 1 - filter_offset] nhot = indice_num_per_loc_np[kv - 1 - filter_offset]
...@@ -310,16 +343,20 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True): ...@@ -310,16 +343,20 @@ def dev_subm_inds_v2(subm: bool = False, run_conv: bool = True):
o_inds = indice_pairs_np[1][filter_offset][:nhot] o_inds = indice_pairs_np[1][filter_offset][:nhot]
i_inds = indice_pairs_np[0][filter_offset][:nhot] i_inds = indice_pairs_np[0][filter_offset][:nhot]
# print(a_inds_cpu[:10]) # print(a_inds_cpu[:10])
out_gather = output[o_inds] # [N, K] out_gather = output[o_inds] # [N, K]
inp_gather = inp[i_inds] # [N, C] inp_gather = inp[i_inds] # [N, C]
# KN @ NC # KN @ NC
dw_res = out_gather.astype(np.float32).T @ inp_gather.astype(np.float32) dw_res = out_gather.astype(
np.float32).T @ inp_gather.astype(np.float32)
dw_ref[filter_offset] = dw_res dw_ref[filter_offset] = dw_res
# print(indice_pairs_np_test[0]) # print(indice_pairs_np_test[0])
dw_ref_kcrs = dw_ref.transpose(1, 0, 2) dw_ref_kcrs = dw_ref.transpose(1, 0, 2)
dw_cpu = weight_tv.cpu().numpy().reshape(K, np.prod(ksize), C) dw_cpu = weight_tv.cpu().numpy().reshape(K, np.prod(ksize), C)
print("ERROR", np.linalg.norm(dw_cpu.reshape(-1) - dw_ref_kcrs.reshape(-1))) print(
"ERROR",
np.linalg.norm(
dw_cpu.reshape(-1) - dw_ref_kcrs.reshape(-1)))
if __name__ == "__main__": if __name__ == "__main__":
......
# Copyright 2021 Yan Yan # Copyright 2021 Yan Yan
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
import numpy as np import numpy as np
from cumm import tensorview as tv from cumm import tensorview as tv
from spconv.core_cc.csrc.sparse.all import SpconvOps from spconv.core_cc.csrc.sparse.all import SpconvOps
import pickle import pickle
import torch import torch
from spconv.pytorch.cppcore import torch_tensor_to_tv from spconv.pytorch.cppcore import torch_tensor_to_tv
def main(): def main():
with open("/home/yy/asd.pkl", "rb") as f: with open("/home/yy/asd.pkl", "rb") as f:
a_th = pickle.load(f) a_th = pickle.load(f)
mask_argsort = torch.empty((1, a_th.shape[1]), mask_argsort = torch.empty((1, a_th.shape[1]),
dtype=torch.int32, dtype=torch.int32,
device=a_th.device) device=a_th.device)
a = a_th.cpu().numpy()[0] a = a_th.cpu().numpy()[0]
a_tv = torch_tensor_to_tv(a_th) a_tv = torch_tensor_to_tv(a_th)
...@@ -34,5 +35,6 @@ def main(): ...@@ -34,5 +35,6 @@ def main():
a_tv_1 = a_tv.clone() a_tv_1 = a_tv.clone()
SpconvOps.sort_1d_by_key(a_tv_1[0], mask_argsort_tv[0]) SpconvOps.sort_1d_by_key(a_tv_1[0], mask_argsort_tv[0])
if __name__ == "__main__": if __name__ == "__main__":
main() main()
\ No newline at end of file
...@@ -38,9 +38,9 @@ if cuda_ver: ...@@ -38,9 +38,9 @@ if cuda_ver:
cuda_ver = cuda_ver.replace(".", "") # 10.2 to 102 cuda_ver = cuda_ver.replace(".", "") # 10.2 to 102
RELEASE_NAME += "-cu{}".format(cuda_ver) RELEASE_NAME += "-cu{}".format(cuda_ver)
deps = ["cumm-cu{}".format(cuda_ver)] deps = ["cumm-cu{}>=0.2.2".format(cuda_ver)]
else: else:
deps = ["cumm"] deps = ["cumm>=0.2.2"]
...@@ -48,11 +48,11 @@ DESCRIPTION = 'spatial sparse convolution' ...@@ -48,11 +48,11 @@ DESCRIPTION = 'spatial sparse convolution'
URL = 'https://github.com/traveller59/spconv' URL = 'https://github.com/traveller59/spconv'
EMAIL = 'yanyan.sub@outlook.com' EMAIL = 'yanyan.sub@outlook.com'
AUTHOR = 'Yan Yan' AUTHOR = 'Yan Yan'
REQUIRES_PYTHON = '>=3.7' REQUIRES_PYTHON = '>=3.6'
VERSION = None VERSION = None
# What packages are required for this module to be executed? # What packages are required for this module to be executed?
REQUIRED = ["pccm>=0.2.19", "pybind11>=2.6.0", "fire", "numpy", *deps] REQUIRED = ["pccm>=0.2.21", "pybind11>=2.6.0", "fire", "numpy", *deps]
# What packages are optional? # What packages are optional?
EXTRAS = { EXTRAS = {
......
# Copyright 2021 Yan Yan # Copyright 2021 Yan Yan
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -16,4 +16,4 @@ from . import build as _build ...@@ -16,4 +16,4 @@ from . import build as _build
from .core import ConvAlgo, AlgoHint from .core import ConvAlgo, AlgoHint
from . import constants from . import constants
from .__version__ import __version__ from .__version__ import __version__
\ No newline at end of file
...@@ -24,9 +24,10 @@ from spconv.constants import NDIM_DONT_CARE ...@@ -24,9 +24,10 @@ from spconv.constants import NDIM_DONT_CARE
from typing import Optional from typing import Optional
import time import time
from threading import Lock from threading import Lock
import torch import contextlib
import numpy as np import numpy as np
from spconv.core import ConvAlgo, AlgoHint from spconv.core import ConvAlgo, AlgoHint
from spconv.tools import CUDAKernelTimer
ALL_ALGO_DESPS = GemmMainUnitTest.get_all_algo_desp() ALL_ALGO_DESPS = GemmMainUnitTest.get_all_algo_desp()
ALL_CONV_ALGO_DESPS = ConvMainUnitTest.get_all_conv_algo_desp() ALL_CONV_ALGO_DESPS = ConvMainUnitTest.get_all_conv_algo_desp()
...@@ -403,7 +404,8 @@ class SimpleGemm: ...@@ -403,7 +404,8 @@ class SimpleGemm:
alpha: float = 1.0, alpha: float = 1.0,
beta: float = 0.0, beta: float = 0.0,
gather_data: tv.Tensor = tv.Tensor(), gather_data: tv.Tensor = tv.Tensor(),
workspace: tv.Tensor = tv.Tensor()): workspace: tv.Tensor = tv.Tensor(),
timer: CUDAKernelTimer = CUDAKernelTimer(False)):
m, n, k = GemmMainUnitTest.extract_mnk(a.shape, b.shape, trans_a, m, n, k = GemmMainUnitTest.extract_mnk(a.shape, b.shape, trans_a,
trans_b, trans_c, trans_b, trans_c,
shuffle_type.value, shuffle_type.value,
...@@ -446,6 +448,9 @@ class SimpleGemm: ...@@ -446,6 +448,9 @@ class SimpleGemm:
# stream=stream) # stream=stream)
# GemmMainUnitTest.stream_synchronize(stream) # GemmMainUnitTest.stream_synchronize(stream)
# gather = time.time() - tt # gather = time.time() - tt
if timer.enable:
assert timer._timer is not None
params.timer = timer._timer
GemmMainUnitTest.matmul2(params) GemmMainUnitTest.matmul2(params)
# GemmMainUnitTest.stream_synchronize(stream) # GemmMainUnitTest.stream_synchronize(stream)
...@@ -678,7 +683,8 @@ class SimpleConv: ...@@ -678,7 +683,8 @@ class SimpleConv:
beta: float = 0.0, beta: float = 0.0,
stream: int = 0, stream: int = 0,
workspace: tv.Tensor = tv.Tensor(), workspace: tv.Tensor = tv.Tensor(),
verbose: bool = False): verbose: bool = False,
timer: CUDAKernelTimer = CUDAKernelTimer(False)):
channel_k = output.dim(1) channel_k = output.dim(1)
channel_c = inp.dim(1) channel_c = inp.dim(1)
# GemmMainUnitTest.stream_synchronize(stream) # GemmMainUnitTest.stream_synchronize(stream)
...@@ -709,9 +715,11 @@ class SimpleConv: ...@@ -709,9 +715,11 @@ class SimpleConv:
params.mask_filter = mask_filter params.mask_filter = mask_filter
params.mask_output = mask_output params.mask_output = mask_output
params.reverse_mask = reverse_mask params.reverse_mask = reverse_mask
if timer.enable:
assert timer._timer is not None
params.timer = timer._timer
# torch.cuda.synchronize() # torch.cuda.synchronize()
# t = time.time() # t = time.time()
params.workspace = workspace params.workspace = workspace
ConvMainUnitTest.implicit_gemm2(params) ConvMainUnitTest.implicit_gemm2(params)
# torch.cuda.synchronize() # torch.cuda.synchronize()
...@@ -724,6 +732,7 @@ class SimpleConv: ...@@ -724,6 +732,7 @@ class SimpleConv:
def stream_synchronize(self, stream: int): def stream_synchronize(self, stream: int):
return GemmMainUnitTest.stream_synchronize(stream) return GemmMainUnitTest.stream_synchronize(stream)
GEMM = SimpleGemm(ALL_ALGO_DESPS) GEMM = SimpleGemm(ALL_ALGO_DESPS)
CONV = SimpleConv(ALL_CONV_ALGO_DESPS) CONV = SimpleConv(ALL_CONV_ALGO_DESPS)
......
# Copyright 2021 Yan Yan # Copyright 2021 Yan Yan
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -19,7 +19,8 @@ from pccm.utils import project_is_editable, project_is_installed ...@@ -19,7 +19,8 @@ from pccm.utils import project_is_editable, project_is_installed
from ccimport.compat import InWindows from ccimport.compat import InWindows
from .constants import PACKAGE_NAME, PACKAGE_ROOT, DISABLE_JIT from .constants import PACKAGE_NAME, PACKAGE_ROOT, DISABLE_JIT
if project_is_installed(PACKAGE_NAME) and project_is_editable(PACKAGE_NAME) and not DISABLE_JIT: if project_is_installed(PACKAGE_NAME) and project_is_editable(
PACKAGE_NAME) and not DISABLE_JIT:
from spconv.core import SHUFFLE_SIMT_PARAMS, SHUFFLE_VOLTA_PARAMS, SHUFFLE_TURING_PARAMS from spconv.core import SHUFFLE_SIMT_PARAMS, SHUFFLE_VOLTA_PARAMS, SHUFFLE_TURING_PARAMS
from spconv.core import IMPLGEMM_SIMT_PARAMS, IMPLGEMM_VOLTA_PARAMS, IMPLGEMM_TURING_PARAMS from spconv.core import IMPLGEMM_SIMT_PARAMS, IMPLGEMM_VOLTA_PARAMS, IMPLGEMM_TURING_PARAMS
...@@ -27,11 +28,13 @@ if project_is_installed(PACKAGE_NAME) and project_is_editable(PACKAGE_NAME) and ...@@ -27,11 +28,13 @@ if project_is_installed(PACKAGE_NAME) and project_is_editable(PACKAGE_NAME) and
from cumm.conv.main import ConvMainUnitTest from cumm.conv.main import ConvMainUnitTest
from spconv.csrc.sparse.all import SpconvOps from spconv.csrc.sparse.all import SpconvOps
cu = GemmMainUnitTest(SHUFFLE_SIMT_PARAMS + SHUFFLE_VOLTA_PARAMS + SHUFFLE_TURING_PARAMS) cu = GemmMainUnitTest(SHUFFLE_SIMT_PARAMS + SHUFFLE_VOLTA_PARAMS +
SHUFFLE_TURING_PARAMS)
cu.namespace = "cumm.gemm.main" cu.namespace = "cumm.gemm.main"
convcu = ConvMainUnitTest(IMPLGEMM_SIMT_PARAMS + IMPLGEMM_VOLTA_PARAMS + IMPLGEMM_TURING_PARAMS) convcu = ConvMainUnitTest(IMPLGEMM_SIMT_PARAMS + IMPLGEMM_VOLTA_PARAMS +
IMPLGEMM_TURING_PARAMS)
convcu.namespace = "cumm.conv.main" convcu.namespace = "cumm.conv.main"
objects_folder = None objects_folder = None
if InWindows: if InWindows:
# windows have command line limit, so we use objects_folder to reduce command size. # windows have command line limit, so we use objects_folder to reduce command size.
objects_folder = "objects" objects_folder = "objects"
......
# Copyright 2021 Yan Yan # Copyright 2021 Yan Yan
# #
# Licensed under the Apache License, Version 2.0 (the "License"); # Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License. # you may not use this file except in compliance with the License.
# You may obtain a copy of the License at # You may obtain a copy of the License at
# #
# http://www.apache.org/licenses/LICENSE-2.0 # http://www.apache.org/licenses/LICENSE-2.0
# #
# Unless required by applicable law or agreed to in writing, software # Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS, # distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
...@@ -20,10 +20,10 @@ from pccm.utils import project_is_editable, project_is_installed ...@@ -20,10 +20,10 @@ from pccm.utils import project_is_editable, project_is_installed
PACKAGE_NAME = "spconv" PACKAGE_NAME = "spconv"
PACKAGE_ROOT = Path(__file__).parent.resolve() PACKAGE_ROOT = Path(__file__).parent.resolve()
EDITABLE_INSTALLED = project_is_installed(PACKAGE_NAME) and project_is_editable(PACKAGE_NAME) EDITABLE_INSTALLED = project_is_installed(
PACKAGE_NAME) and project_is_editable(PACKAGE_NAME)
_filter_hwio_env = os.getenv("SPCONV_FILTER_HWIO", "0") _filter_hwio_env = os.getenv("SPCONV_FILTER_HWIO", "0")
FILTER_HWIO = _filter_hwio_env == "1" FILTER_HWIO = _filter_hwio_env == "1"
DISABLE_JIT = os.getenv("SPCONV_DISABLE_JIT", "0") == "1" DISABLE_JIT = os.getenv("SPCONV_DISABLE_JIT", "0") == "1"
NDIM_DONT_CARE = 3 NDIM_DONT_CARE = 3
\ No newline at end of file
This diff is collapsed.
# Copyright 2021 Yan Yan
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Copyright 2021 Yan Yan
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment