Commit 596a3cc0 authored by yan.yan's avatar yan.yan
Browse files

update docs

parent b63c08aa
...@@ -45,7 +45,7 @@ ...@@ -45,7 +45,7 @@
| -------------- |:---------------------:| ---------------------:| ---------------------:| | -------------- |:---------------------:| ---------------------:| ---------------------:|
| CPU (Linux Only) | [![PyPI Version][pypi-ver-cpu]][pypi-url-cpu] | ```pip install spconv``` | [![pypi monthly download][pypi-download-cpu]][pypi-url-cpu] | | CPU (Linux Only) | [![PyPI Version][pypi-ver-cpu]][pypi-url-cpu] | ```pip install spconv``` | [![pypi monthly download][pypi-download-cpu]][pypi-url-cpu] |
| CUDA 10.2 | [![PyPI Version][pypi-ver-102]][pypi-url-102] | ```pip install spconv-cu102```| [![pypi monthly download][pypi-download-102]][pypi-url-102]| | CUDA 10.2 | [![PyPI Version][pypi-ver-102]][pypi-url-102] | ```pip install spconv-cu102```| [![pypi monthly download][pypi-download-102]][pypi-url-102]|
| CUDA 11.3 (Linux Only) | [![PyPI Version][pypi-ver-113]][pypi-url-113] | ```pip install spconv-cu113```| [![pypi monthly download][pypi-download-113]][pypi-url-113]| | CUDA 11.3 | [![PyPI Version][pypi-ver-113]][pypi-url-113] | ```pip install spconv-cu113```| [![pypi monthly download][pypi-download-113]][pypi-url-113]|
| CUDA 11.4 | [![PyPI Version][pypi-ver-114]][pypi-url-114] | ```pip install spconv-cu114```| [![pypi monthly download][pypi-download-114]][pypi-url-114]| | CUDA 11.4 | [![PyPI Version][pypi-ver-114]][pypi-url-114] | ```pip install spconv-cu114```| [![pypi monthly download][pypi-download-114]][pypi-url-114]|
| CUDA 11.7 | [![PyPI Version][pypi-ver-117]][pypi-url-117] | ```pip install spconv-cu117```| [![pypi monthly download][pypi-download-117]][pypi-url-117]| | CUDA 11.7 | [![PyPI Version][pypi-ver-117]][pypi-url-117] | ```pip install spconv-cu117```| [![pypi monthly download][pypi-download-117]][pypi-url-117]|
<!-- | CUDA 12.0 | [![PyPI Version][pypi-ver-120]][pypi-url-120] | ```pip install spconv-cu120```| [![pypi monthly download][pypi-download-120]][pypi-url-120]| --> <!-- | CUDA 12.0 | [![PyPI Version][pypi-ver-120]][pypi-url-120] | ```pip install spconv-cu120```| [![pypi monthly download][pypi-download-120]][pypi-url-120]| -->
......
...@@ -16,47 +16,30 @@ ...@@ -16,47 +16,30 @@
## Simple Benchmark ## Simple Benchmark
### Network Benchmark without batchnorm (F32/F16) in RTX 3080 Laptop GPU 150W ### Network Benchmark without batchnorm (TF32/F16) in Different GPUs
Network Code: test/benchmark.py Basic: ```python -m spconv.benchmark bench_basic f16``` and ```python -m spconv.benchmark bench_basic tf32```
| F32/F16 | Spconv 1.x F32 (1080Ti) | Native| Implicit Gemm | Implicit Gemm Split Mask | | GPUs | F16-Forward | F16-Backward | TF32-Forward | TF32-Backward |
| -------------- |:---------------------:|---------------------:|---------------------:| ---------------------:| | -------------- |:---------------------:|---------------------:|---------------------:| ---------------------:|
| Forward | 43ms | 21.7ms/13.7ms | 23.5ms/11.2ms | 22ms/12.2ms | | T4 | 18.74 | 25.51 | N/A | N/A |
| Backward | 80ms | 41.9ms/25.2ms | 51.0ms/13.8ms | 41.1ms/12.2ms | | RTX 3080 Laptop (150W) | 8.2 | 11.51 | 15.04 | 26.90 |
| A100 | 13.02 | 12.43 | 12.35 | 14.93 |
| RTX3090 | 11.84 | 11.84 | 13.23 | 15.79 |
| RTX A6000 | 11.11 | 8.97 | 12.30 | 12.79 |
| F16 Forward | Native| Implicit Gemm | Implicit Gemm Split Mask | Large: ```python -m spconv.benchmark bench_large f16``` and ```python -m spconv.benchmark bench_large tf32```
| -------------- |:---------------------:|---------------------:| ---------------------:|
| RTX 3080 Laptop 150W@1755MHz | 13.7ms | 11.2ms | 12.2ms |
| RTX A6000 | 19.1ms | 11.7ms | 14.0ms |
| TESLA V100 | 17.9ms | 11.4ms | 13.4ms |
| A100 | 23.8ms | 12.4ms | 15.1ms |
| F16 Backward | Native| Implicit Gemm | Implicit Gemm Split Mask | | GPUs | F16-Forward | F16-Backward | TF32-Forward | TF32-Backward |
| -------------- |:---------------------:|---------------------:| ---------------------:| | -------------- |:---------------------:|---------------------:|---------------------:| ---------------------:|
| RTX 3080 Laptop 150W@1755MHz | 25.2ms | 13.8ms | 12.2ms | | T4 | 128.7 | 203.3 | N/A | N/A |
| RTX A6000 | 28.1ms | 9.2ms | 8.9ms | | RTX 3080 Laptop (150W) | 43.15 | 74.57 | 84.65 | 165.19 |
| TESLA V100 | 33.9ms | 12.2ms | 12.9ms | | A100 | 19.85 | 31.24 | 29.58 | 55.63 |
| A100 | 37.6ms | 12.2ms | 13.9ms | | RTX3090 | 27.83 | 40.45 | 44.51 | 73.17 |
| RTX A6000 | 28.62 | 39.86 | 45.43 | 74.11 |
### Network Gemm Kernel Benchmark FP16 in RTX 3080 Laptop GPU
Network Code: test/benchmark.py
The network/input/profile code is same as above table.
This table only profile **fp16 gemm kernels** without output tensor create/clear overhead. this table show the performance upper bound of our algorithm.
| F16 | Native| Implicit Gemm | Implicit Gemm Split Mask |
| -------------- |:---------------------:|---------------------:| ---------------------:|
| Forward | 8.0ms | 4.3ms | 4.0ms |
We can see that the implicit gemm is very fast, gemm only use 4.3ms/11.2ms in network forward. we can achieve better performance in TensorRT + Pure C++.
**NOTE** **NOTE**
When you want to benchmark network in your laptop, don't forget to close all apps except terminals! Other apps will consume GPU resource and make kernels run slower. When you want to benchmark network in your laptop, don't forget to close all apps except terminals! Other apps will consume GPU resource and make kernels run slower.
## Comparsion with [MinkowskiEngine](https://github.com/NVIDIA/MinkowskiEngine) and [torchsparse](https://github.com/mit-han-lab/torchsparse)
TODO
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment