- 07 Jun, 2022 1 commit
-
-
ndickson-nvidia authored
* * Added specialization of cublasGemm function for `__half` type, to try to address https://github.com/dmlc/dgl/issues/3988 * * Added USE_FP16 guard * * Added test cases to test_segment_mm, to test newly-added FP16 specialization of cublasGemm * * Replaced for loop in test_segment_mm with pytest.mark.parametrize, as recommended Co-authored-by:
Xin Yao <xiny@nvidia.com>
-
- 23 Feb, 2022 1 commit
-
-
Minjie Wang authored
* WIP: TypedLinear and new RelGraphConv * wip * further simplify RGCN * a bunch of tweak for performance; add basic cpu support * update on segmm * wip: segment.cu * new backward kernel works * fix a bunch of bugs in kernel; leave idx_a for future * add nn test for typed_linear * rgcn nn test * bugfix in corner case; update RGCN README * doc * fix cpp lint * fix lint * fix ut * wip: hgtconv; presorted flag for rgcn * hgt code and ut; WIP: some fix on reorder graph * better typed linear init * fix ut * fix lint; add docstring
-
- 15 Feb, 2022 1 commit
-
-
Israt Nisa authored
* init * init * working cublasGemm * benchmark high-mem/low-mem, err gather_mm output * cuda kernel for bmm like kernel * removed cpu copy for E_per_Rel * benchmark code from Minjie * fixed cublas results in gathermm sorted * use GPU shared mem in unsorted gather mm * minor * Added an optimal version of gather_mm_unsorted * lint * init gather_mm_scatter * cublas transpose added * fixed h_offset for multiple rel * backward unittest * cublas support to transpose W * adding missed file * forgot to add header file * lint * lint * cleanup * lint * docstring * lint * added unittest * lint * lint * unittest * changed err type * skip cpu test * skip CPU code * move in-len loop inside * lint * added check different dim length for B * w_per_len is optional now * moved gather_mm to pytorch/backend with backward support * removed a_/b_trans support * transpose op inside GEMM call * removed out alloc from API, changed W 2D to 3D * Added se_gather_mm, Separate API for sortedE * Fixed gather_mm (unsorted) user interface * unsorted gmm backward + separate CAPI for un/sorted A * typecast to float to support atomicAdd * lint typecast * lint * added gather_mm_scatter * minor * const * design changes * Added idx_a, idx_b support gmm_scatter * dgl doc * lint * adding gather_mm in ops * lint * lint * minor * removed benchmark files * minor * empty commit Co-authored-by:Israt Nisa <nisisrat@amazon.com>
-