"examples/pytorch/git@developer.sourcefind.cn:OpenDAS/dgl.git" did not exist on "2190c39d674f76c65db9ee8da7b43d3021f19c29"
Unverified Commit 6b02babb authored by Zihao Ye's avatar Zihao Ye Committed by GitHub
Browse files

[doc] Add docstring for segment reduce. (#2375)

parent 35a3ead2
...@@ -239,7 +239,7 @@ Like GSpMM, GSDDMM operators support both homogeneous and bipartite graph. ...@@ -239,7 +239,7 @@ Like GSpMM, GSDDMM operators support both homogeneous and bipartite graph.
Edge Softmax module Edge Softmax module
------------------- -------------------
We also provide framework agnostic edge softmax module which was frequently used in DGL also provide framework agnostic edge softmax module which was frequently used in
GNN-like structures, e.g. GNN-like structures, e.g.
`Graph Attention Network <https://arxiv.org/pdf/1710.10903.pdf>`_, `Graph Attention Network <https://arxiv.org/pdf/1710.10903.pdf>`_,
`Transformer <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>`_, `Transformer <https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf>`_,
...@@ -250,6 +250,16 @@ GNN-like structures, e.g. ...@@ -250,6 +250,16 @@ GNN-like structures, e.g.
edge_softmax edge_softmax
Segment Reduce Module
---------------------
DGL provide operators to reduce value tensor along the first dimension by segments.
.. autosummary::
:toctree: ../../generated/
segment_reduce
Relation with Message Passing APIs Relation with Message Passing APIs
---------------------------------- ----------------------------------
......
...@@ -1512,23 +1512,27 @@ def segment_reduce(op, x, offsets): ...@@ -1512,23 +1512,27 @@ def segment_reduce(op, x, offsets):
"""Segment reduction operator. """Segment reduction operator.
It aggregates the value tensor along the first dimension by segments. It aggregates the value tensor along the first dimension by segments.
The first argument ``seglen`` stores the length of each segment. Its The argument ``offsets`` specifies the start offset of each segment (and
summation must be equal to the first dimension of the ``value`` tensor. the upper bound of the last segment). Zero-length segments are allowed.
Zero-length segments are allowed.
.. math::
y_i = \Phi_{j=\mathrm{offsets}_i}^{\mathrm{offsets}_{i+1}-1} x_j
where :math:`\Phi` is the reduce operator.
Parameters Parameters
---------- ----------
op : str op : str
Aggregation method. Can be 'sum', 'max', 'min'. Aggregation method. Can be ``sum``, ``max``, ``min``.
seglen : Tensor x : Tensor
Segment lengths.
value : Tensor
Value to aggregate. Value to aggregate.
offsets : Tensor
The start offsets of segments.
Returns Returns
------- -------
Tensor Tensor
Aggregated tensor of shape ``(len(seglen), value.shape[1:])``. Aggregated tensor of shape ``(len(offsets) - 1, value.shape[1:])``.
""" """
pass pass
......
...@@ -69,8 +69,6 @@ def segment_softmax(seglen, value): ...@@ -69,8 +69,6 @@ def segment_softmax(seglen, value):
Segment lengths. Segment lengths.
value : Tensor value : Tensor
Value to aggregate. Value to aggregate.
reducer : str, optional
Aggregation method. Can be 'sum', 'max', 'min', 'mean'.
Returns Returns
------- -------
......
...@@ -252,18 +252,22 @@ def _segment_reduce(op, feat, offsets): ...@@ -252,18 +252,22 @@ def _segment_reduce(op, feat, offsets):
r"""Segment reduction operator. r"""Segment reduction operator.
It aggregates the value tensor along the first dimension by segments. It aggregates the value tensor along the first dimension by segments.
The first argument ``seglen`` stores the length of each segment. Its The argument ``offsets`` specifies the start offset of each segment (and
summation must be equal to the first dimension of the ``value`` tensor. the upper bound of the last segment). Zero-length segments are allowed.
Zero-length segments are allowed.
.. math::
y_i = \Phi_{j=\mathrm{offsets}_i}^{\mathrm{offsets}_{i+1}-1} x_j
where :math:`\Phi` is the reduce operator.
Parameters Parameters
---------- ----------
op : str op : str
Aggregation method. Can be 'sum', 'max', 'min'. Aggregation method. Can be ``sum``, ``max``, ``min``.
seglen : Tensor x : Tensor
Segment lengths.
value : Tensor
Value to aggregate. Value to aggregate.
offsets : Tensor
The start offsets of segments.
Returns Returns
------- -------
......
...@@ -12,6 +12,12 @@ namespace dgl { ...@@ -12,6 +12,12 @@ namespace dgl {
namespace aten { namespace aten {
namespace cpu { namespace cpu {
/*!
* \brief CPU kernel of segment sum.
* \param feat The input tensor.
* \param offsets The offset tensor storing the ranges of segments.
* \param out The output tensor.
*/
template <typename IdType, typename DType> template <typename IdType, typename DType>
void SegmentSum(NDArray feat, NDArray offsets, NDArray out) { void SegmentSum(NDArray feat, NDArray offsets, NDArray out) {
int n = out->shape[0]; int n = out->shape[0];
...@@ -31,6 +37,14 @@ void SegmentSum(NDArray feat, NDArray offsets, NDArray out) { ...@@ -31,6 +37,14 @@ void SegmentSum(NDArray feat, NDArray offsets, NDArray out) {
} }
} }
/*!
* \brief CPU kernel of segment min/max.
* \param feat The input tensor.
* \param offsets The offset tensor storing the ranges of segments.
* \param out The output tensor.
* \param arg An auxiliary tensor storing the argmin/max information
* used in backward phase.
*/
template <typename IdType, typename DType, typename Cmp> template <typename IdType, typename DType, typename Cmp>
void SegmentCmp(NDArray feat, NDArray offsets, void SegmentCmp(NDArray feat, NDArray offsets,
NDArray out, NDArray arg) { NDArray out, NDArray arg) {
...@@ -58,6 +72,12 @@ void SegmentCmp(NDArray feat, NDArray offsets, ...@@ -58,6 +72,12 @@ void SegmentCmp(NDArray feat, NDArray offsets,
} }
} }
/*!
* \brief CPU kernel of backward phase of segment min/max.
* \param feat The input tensor.
* \param arg The argmin/argmax tensor.
* \param out The output tensor.
*/
template <typename IdType, typename DType> template <typename IdType, typename DType>
void BackwardSegmentCmp(NDArray feat, NDArray arg, NDArray out) { void BackwardSegmentCmp(NDArray feat, NDArray arg, NDArray out) {
int n = feat->shape[0]; int n = feat->shape[0];
......
...@@ -146,7 +146,7 @@ __device__ __forceinline__ Idx BinarySearchSrc(const Idx *array, Idx length, Idx ...@@ -146,7 +146,7 @@ __device__ __forceinline__ Idx BinarySearchSrc(const Idx *array, Idx length, Idx
* is responsible for the computation on different edges. Threadblocks * is responsible for the computation on different edges. Threadblocks
* on the x-axis are responsible for the computation on different positions * on the x-axis are responsible for the computation on different positions
* in feature dimension. * in feature dimension.
* To efficiently find the source node idx and destination node index of an * To efficiently find the source node idx and destination node index of an
* given edge on Csr format, it uses binary search (time complexity O(log N)). * given edge on Csr format, it uses binary search (time complexity O(log N)).
*/ */
template <typename Idx, typename DType, typename BinaryOp, template <typename Idx, typename DType, typename BinaryOp,
...@@ -239,7 +239,7 @@ void SDDMMCoo( ...@@ -239,7 +239,7 @@ void SDDMMCoo(
coo.num_rows, coo.num_cols, nnz, reduce_dim, coo.num_rows, coo.num_cols, nnz, reduce_dim,
lhs_off, rhs_off, lhs_off, rhs_off,
lhs_len, rhs_len, len); lhs_len, rhs_len, len);
}); });
} else { } else {
const int ntx = FindNumThreads(len); const int ntx = FindNumThreads(len);
const int nty = CUDA_MAX_NUM_THREADS / ntx; const int nty = CUDA_MAX_NUM_THREADS / ntx;
......
...@@ -19,6 +19,8 @@ namespace cuda { ...@@ -19,6 +19,8 @@ namespace cuda {
/*! /*!
* \brief CUDA kernel of segment reduce. * \brief CUDA kernel of segment reduce.
* \note each blockthread is responsible for aggregation on a row
* in the result tensor.
*/ */
template <typename IdType, typename DType, template <typename IdType, typename DType,
typename ReduceOp> typename ReduceOp>
...@@ -41,7 +43,9 @@ __global__ void SegmentReduceKernel( ...@@ -41,7 +43,9 @@ __global__ void SegmentReduceKernel(
} }
/*! /*!
* \brief CUDA kernel of segment reduce. * \brief CUDA kernel of backward phase in segment min/max.
* \note each blockthread is responsible for writing a row in the
* result gradient tensor by lookup the ArgMin/Max for index information.
*/ */
template <typename IdType, typename DType> template <typename IdType, typename DType>
__global__ void BackwardSegmentCmpKernel( __global__ void BackwardSegmentCmpKernel(
...@@ -57,6 +61,13 @@ __global__ void BackwardSegmentCmpKernel( ...@@ -57,6 +61,13 @@ __global__ void BackwardSegmentCmpKernel(
} }
} }
/*!
* \brief CUDA implementation of forward phase of Segment Reduce.
* \param feat The input tensor.
* \param offsets The offsets tensor.
* \param out The output tensor.
* \param arg An auxiliary tensor storing ArgMax/Min information,
*/
template <typename IdType, typename DType, typename ReduceOp> template <typename IdType, typename DType, typename ReduceOp>
void SegmentReduce( void SegmentReduce(
NDArray feat, NDArray feat,
...@@ -80,12 +91,19 @@ void SegmentReduce( ...@@ -80,12 +91,19 @@ void SegmentReduce(
const int nty = 1; const int nty = 1;
const dim3 nblks(nbx, nby); const dim3 nblks(nbx, nby);
const dim3 nthrs(ntx, nty); const dim3 nthrs(ntx, nty);
// TODO(zihao): try cub's DeviceSegmentedReduce and compare the performance.
CUDA_KERNEL_CALL((SegmentReduceKernel<IdType, DType, ReduceOp>), CUDA_KERNEL_CALL((SegmentReduceKernel<IdType, DType, ReduceOp>),
nblks, nthrs, 0, thr_entry->stream, nblks, nthrs, 0, thr_entry->stream,
feat_data, offsets_data, out_data, arg_data, feat_data, offsets_data, out_data, arg_data,
n, dim); n, dim);
} }
/*!
* \brief CUDA implementation of backward phase of Segment Reduce with Min/Max reducer.
* \param feat The input tensor.
* \param arg The ArgMin/Max information, used for indexing.
* \param out The output tensor.
*/
template <typename IdType, typename DType> template <typename IdType, typename DType>
void BackwardSegmentCmp( void BackwardSegmentCmp(
NDArray feat, NDArray feat,
......
...@@ -19,7 +19,7 @@ using namespace cuda; ...@@ -19,7 +19,7 @@ using namespace cuda;
namespace aten { namespace aten {
namespace cuda { namespace cuda {
/*! /*!
* \brief CUDA Kernel of filling the vector started from ptr of size length * \brief CUDA Kernel of filling the vector started from ptr of size length
* with val. * with val.
* \note internal use only. * \note internal use only.
...@@ -134,7 +134,7 @@ __global__ void ArgSpMMCooKernel( ...@@ -134,7 +134,7 @@ __global__ void ArgSpMMCooKernel(
/*! /*!
* \brief CUDA kernel of g-SpMM on Coo format. * \brief CUDA kernel of g-SpMM on Coo format.
* \note it uses node parallel strategy, different threadblocks (on y-axis) * \note it uses node parallel strategy, different threadblocks (on y-axis)
* is responsible for the computation on different destination nodes. * is responsible for the computation on different destination nodes.
* Threadblocks on the x-axis are responsible for the computation on * Threadblocks on the x-axis are responsible for the computation on
* different positions in feature dimension. * different positions in feature dimension.
*/ */
...@@ -191,10 +191,10 @@ __global__ void SpMMCsrKernel( ...@@ -191,10 +191,10 @@ __global__ void SpMMCsrKernel(
* \param ufeat The feature on source nodes. * \param ufeat The feature on source nodes.
* \param efeat The feature on edges. * \param efeat The feature on edges.
* \param out The result feature on destination nodes. * \param out The result feature on destination nodes.
* \param argu Arg-Min/Max on source nodes, which refers the source node indices * \param argu Arg-Min/Max on source nodes, which refers the source node indices
* correspond to the minimum/maximum values of reduction result on * correspond to the minimum/maximum values of reduction result on
* destination nodes. It's useful in computing gradients of Min/Max reducer. * destination nodes. It's useful in computing gradients of Min/Max reducer.
* \param arge Arg-Min/Max on edges. which refers the source node indices * \param arge Arg-Min/Max on edges. which refers the source node indices
* correspond to the minimum/maximum values of reduction result on * correspond to the minimum/maximum values of reduction result on
* destination nodes. It's useful in computing gradients of Min/Max reducer. * destination nodes. It's useful in computing gradients of Min/Max reducer.
*/ */
...@@ -263,10 +263,10 @@ void SpMMCoo( ...@@ -263,10 +263,10 @@ void SpMMCoo(
* \param ufeat The feature on source nodes. * \param ufeat The feature on source nodes.
* \param efeat The feature on edges. * \param efeat The feature on edges.
* \param out The result feature on destination nodes. * \param out The result feature on destination nodes.
* \param argu Arg-Min/Max on source nodes, which refers the source node indices * \param argu Arg-Min/Max on source nodes, which refers the source node indices
* correspond to the minimum/maximum values of reduction result on * correspond to the minimum/maximum values of reduction result on
* destination nodes. It's useful in computing gradients of Min/Max reducer. * destination nodes. It's useful in computing gradients of Min/Max reducer.
* \param arge Arg-Min/Max on edges. which refers the source node indices * \param arge Arg-Min/Max on edges. which refers the source node indices
* correspond to the minimum/maximum values of reduction result on * correspond to the minimum/maximum values of reduction result on
* destination nodes. It's useful in computing gradients of Min/Max reducer. * destination nodes. It's useful in computing gradients of Min/Max reducer.
*/ */
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment