"git@developer.sourcefind.cn:change/sglang.git" did not exist on "c23eda8589f62eb9dc94ae44c6bccf976125351d"
- 23 Nov, 2023 1 commit
-
-
Muhammed Fatih BALIN authored
-
- 22 Nov, 2023 1 commit
-
-
Muhammed Fatih BALIN authored
-
- 14 Aug, 2023 1 commit
-
-
Xin Yao authored
Signed-off-by:Xin Yao <xiny@nvidia.com>
-
- 10 Aug, 2023 1 commit
-
-
Chang Liu authored
-
- 19 Jul, 2023 1 commit
-
-
Muhammed Fatih BALIN authored
Co-authored-by:Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
-
- 14 Jul, 2023 2 commits
-
-
Muhammed Fatih BALIN authored
-
Muhammed Fatih BALIN authored
Co-authored-by:Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com>
-
- 13 Jul, 2023 1 commit
-
-
Muhammed Fatih BALIN authored
Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com>
-
- 17 May, 2023 1 commit
-
-
nv-dlasalle authored
[Performance Improvement] Make GPU sampling and to_block use pinned memory to decrease required synchronization (#5685)
-
- 23 Mar, 2023 1 commit
-
-
Xin Yao authored
* update for segmentMM * update for sddmm * fix a bug
-
- 08 Mar, 2023 1 commit
-
-
Rhett Ying authored
-
- 12 Jan, 2023 1 commit
-
-
nv-dlasalle authored
* Add failing unit test * Add fix * Remove extra newline * skip cpu test Co-authored-by:Xin Yao <yaox12@outlook.com>
-
- 09 Dec, 2022 1 commit
-
-
Xin Yao authored
* fix empty tensor is treated as pinned * avoid calling cudaHostGetDevicePointer on nullptr * update empty array * add a comment
-
- 06 Dec, 2022 1 commit
-
-
Chang Liu authored
* Add support for next cusparse release * Fix lint * Add switch and tune the performance * Fix lint issue * Fine tune the heuristics * Fix lint issue * Address comments * Minor fix * Address comments
-
- 24 Nov, 2022 1 commit
-
-
Xin Yao authored
-
- 22 Nov, 2022 2 commits
-
-
Ping Gong authored
* Leverage hashmap to accelerate CSRSliceMatrix * fix lint check * use `min` in cuda_runtime.ch * fix hash func * add some comments and adjust the <grid,block> of the _SegmentMaskColKernel kernel * set device and stream for thrust::for_each * use thrust::cuda::par_nosync Co-authored-by:Xin Yao <xiny@nvidia.com>
-
Muhammed Fatih BALIN authored
* adding LABOR sampling * add ladies and pladies samplers * fix compile error after rebase * add reference for ladies sampler * Improve ladies implementation. * weighted labor sampling initial implementation draft fix indentation and small bug in ladies script * importance_sampling currently doesn't work with weights * fix weighted importance sampling * move labor example into its own folder * lint fixes * Improve documentation * remove examples from the main PR * fix linting by not using c++17 features * fix documentation of labor_sampler.py * update documentation for labor.py * reformat the labor.py file with black * fix linting errors * replace exception use with if * fix typo in error comment * fixing win64 build for ci * fixing weighted implementation, works now. * fix bug in the weighted case and importance_sampling==0 * address part of the reviews * remove unused code paths from cuda * remove unused code path from cpu side * remove extra features of labor making use of random seed. * fix exclude_edges bug * remove pcg and seed logic from cpu implementation, seed logic should still work for cuda. * minor style change * refactor CPU implementation, take out the importance_sampling probability computation into a function. * improve CUDAWorkspaceAllocator * refactor importance_sampling part out to a function * minor optimization * fix linting issue * Revert "remove pcg and seed logic from cpu implementation, seed logic should still work for cuda." This reverts commit c250e07ac6d7e13f57e79e8a2c2f098d777378c2. * Revert "remove extra features of labor making use of random seed." This reverts commit 7f99034353080308f4783f27d9a08bea343fb796. * fix the documentation * disable NIDs * improve the documentation in the code * use the stream argument in pcg32 instead of skipping ahead t times, can discard the use of hashmap now since it is faster this way. * fix linting issue * address another round of reviews * further optimize CPU LABOR sampling implementation * fix linting error * update the comment * reformat * rename and rephrase comment * fix formatting according to new linting specs * fix compile error due to renaming, fix linting. * lint * rename DGLHeteroGraph to DGLGraph to match master * replace other occurrences of DGLHeteroGraph to DGLGraph Co-authored-by:
Muhammed Fatih BALIN <m.f.balin@gmail.com> Co-authored-by:
Kaan Sancak <kaansnck@gmail.com> Co-authored-by:
Quan Gan <coin2028@hotmail.com>
-
- 10 Nov, 2022 1 commit
-
-
Xin Yao authored
* update accumulator * rename half to __half * add bfloat16 * simplify code * fix another case * add unit test * disable half-precision SpMMCoo * fix lint
-
- 08 Nov, 2022 1 commit
-
-
Hongzhi (Steve), Chen authored
* [Misc] Change the max line length for cpp to 80 in lint. * blabla * blabla * blabla * ablabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 07 Nov, 2022 4 commits
-
-
Hongzhi (Steve), Chen authored
* [Misc] clang-format auto fix. * blabla * nolint * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Hongzhi (Steve), Chen authored
* blabla * more * blabla * blabla * ablabla * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Hongzhi (Steve), Chen authored
* [Misc] clang-format auto fix. * blabla * ablabla * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Hongzhi (Steve), Chen authored
* replace * blabla * balbla * blabla Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 06 Nov, 2022 2 commits
-
-
Hongzhi (Steve), Chen authored
* param * brief * note * return * tparam * brief2 * file * return2 * return * blabla * all Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Xin Yao authored
* add bf16 specializations * remove SWITCH_BITS * enable amp for bf16 * remove SWITCH_BITS for cpu kernels * enbale bf16 based on CUDART * fix compiling for sm<80 * fix cpu build * enable unit tests * update doc * disable test for CUDA < 11.0 * address comments * address comments
-
- 03 Nov, 2022 2 commits
-
-
Hongzhi (Steve), Chen authored
* [Misc] clang-format auto fix. * manual * manual * manual * manual * todo * fix Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
Xin Yao authored
* get device pointers * change if condition to IsPinned
-
- 28 Oct, 2022 1 commit
-
-
Quan (Andy) Gan authored
* sample neighbors with masks * oops * refactor again * remove * remove debug code * rename macro * address comments * address comment * address comments * rename a lot of stuff * oops
-
- 13 Oct, 2022 1 commit
-
-
Mufei Li authored
* Update from master (#4584) * [Example][Refactor] Refactor graphsage multigpu and full-graph example (#4430) * Add refactors for multi-gpu and full-graph example * Fix format * Update * Update * Update * [Cleanup] Remove async_transferer (#4505) * Remove async_transferer * remove test * Remove AsyncTransferer Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Xin Yao <yaox12@outlook.com> * [Cleanup] Remove duplicate entries of CUB submodule (issue# 4395) (#4499) * remove third_part/cub * remove from third_party Co-authored-by:
Israt Nisa <nisisrat@amazon.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> * [Bug] Enable turn on/off libxsmm at runtime (#4455) * enable turn on/off libxsmm at runtime by adding a global config and related API Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> * [Feature] Unify the cuda stream used in core library (#4480) * Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by:
xiny <xiny@nvidia.com> * [Feature] Added exclude_self and output_batch to knn graph construction (Issues #4323 #4316) (#4389) * * Added "exclude_self" and "output_batch" options to knn_graph and segmented_knn_graph * Updated out-of-date comments on remove_edges and remove_self_loop, since they now preserve batch information * * Changed defaults on new knn_graph and segmented_knn_graph function parameters, for compatibility; pytorch/test_geometry.py was failing * * Added test to ensure dgl.remove_self_loop function correctly updates batch information * * Added new knn_graph and segmented_knn_graph parameters to dgl.nn.KNNGraph and dgl.nn.SegmentedKNNGraph * * Formatting * * Oops, I missed the one in segmented_knn_graph when I fixed the similar thing in knn_graph * * Fixed edge case handling when invalid k specified, since it still needs to be handled consistently for tests to pass * Fixed context of batch info, since it must match the context of the input position data for remove_self_loop to succeed * * Fixed batch info resulting from knn_graph when output_batch is true, for case of 3D input tensor, representing multiple segments * * Added testing of new exclude_self and output_batch parameters on knn_graph and segmented_knn_graph, and their wrappers, KNNGraph and SegmentedKNNGraph, into the test_knn_cuda test * * Added doc comments for new parameters * * Added correct handling for uncommon case of k or more coincident points when excluding self edges in knn_graph and segmented_knn_graph * Added test cases for more than k coincident points * * Updated doc comments for output_batch parameters for clarity * * Linter formatting fixes * * Extracted out common function for test_knn_cpu and test_knn_cuda, to add the new test cases to test_knn_cpu * * Rewording in doc comments * * Removed output_batch parameter from knn_graph and segmented_knn_graph, in favour of always setting the batch information, except in knn_graph if x is a 2D tensor Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> * [CI] only known devs are authorized to trigger CI (#4518) * [CI] only known devs are authorized to trigger CI * fix if author is null * add comments * [Readability] Auto fix setup.py and update-version.py (#4446) * Auto fix update-version * Auto fix setup.py * Auto fix update-version * Auto fix setup.py * [Doc] Change random.py to random_partition.py in guide on distributed partition pipeline (#4438) * Update distributed-preprocessing.rst * Update Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * fix unpinning when tensoradaptor is not available (#4450) * [Doc] fix print issue in tutorial (#4459) * [Example][Refactor] Refactor RGCN example (#4327) * Refactor full graph entity classification * Refactor rgcn with sampling * README update * Update * Results update * Respect default setting of self_loop=false in entity.py * Update * Update README * Update for multi-gpu * Update * [doc] fix invalid link in user guide (#4468) * [Example] directional_GSN for ogbg-molpcba (#4405) * version-1 * version-2 * version-3 * update examples/README * Update .gitignore * update performance in README, delete scripts * 1st approving review * 2nd approving review Co-authored-by:
Mufei Li <mufeili1996@gmail.com> * Clarify the message name, which is 'm'. (#4462) Co-authored-by:
Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by:
Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * [Refactor] Auto fix view.py. (#4461) Co-authored-by:
Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> * [Example] SEAL for OGBL (#4291) * [Example] SEAL for OGBL * update index * update * fix readme typo * add seal sampler * modify set ops * prefetch * efficiency test * update * optimize * fix ScatterAdd dtype issue * update sampler style * update Co-authored-by:
Quan Gan <coin2028@hotmail.com> * [CI] use https instead of http (#4488) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() (#4487) * [BugFix] fix crash due to incorrect dtype in dgl.to_block() * fix test failure in TF * [Feature] Make TensorAdapter Stream Aware (#4472) * Allocate tensors in DGL's current stream * make tensoradaptor stream-aware * replace TAemtpy with cpu allocator * fix typo * try fix cpu allocation * clean header * redirect AllocDataSpace as well * resolve comments * [Build][Doc] Specify the sphinx version (#4465) Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> * reformat * reformat * Auto fix update-version * Auto fix setup.py * reformat * reformat Co-authored-by:
Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by:
Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by:
Mufei Li <mufeili1996@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Chang Liu <chang.liu@utexas.edu> Co-authored-by:
Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
rudongyu <ru_dongyu@outlook.com> Co-authored-by:
Quan Gan <coin2028@hotmail.com> * Move mock version of dgl_sparse library to DGL main repo (#4524) * init * Add api doc for sparse library * support op btwn matrices with differnt sparsity * Fixed docstring * addresses comments * lint check * change keyword format to fmt Co-authored-by:
Israt Nisa <nisisrat@amazon.com> * [DistPart] expose timeout config for process group (#4532) * [DistPart] expose timeout config for process group * refine code * Update tools/distpartitioning/data_proc_pipeline.py Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> * [Feature] Import PyTorch's CUDA stream management (#4503) * add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle * [examples]educe memory consumption (#4558) * [examples]educe memory consumption * reffine help message * refine * [Feature][REVIEW] Enable DGL cugaph nightly CI (#4525) * Added cugraph nightly scripts * Removed nvcr.io//nvidia/pytorch:22.04-py3 reference Co-authored-by:
Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> * Revert "[Feature][REVIEW] Enable DGL cugaph nightly CI (#4525)" (#4563) This reverts commit ec171c64 . * [Misc] Add flake8 lint workflow. (#4566) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. * Add flake8 annotation in workflow. * remove * add * clean up Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Try use official pylint workflow. (#4568) * polish update_version * update pylint workflow. * add * revert. Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [CI] refine stage logic (#4565) * [CI] refine stage logic * refine * refine * remove (#4570) Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Add Pylint workflow for flake8. (#4571) * remove * Add pylint. Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Update the python version in Pylint workflow for flake8. (#4572) * remove * Add pylint. * Change the python version for pylint. Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4574) Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Misc] Use another workflow. (#4575) * Update pylint. * Use another workflow. Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint. (#4576) Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * Update pylint.yml * Update pylint.yml * Delete pylint.yml * [Misc]Add pyproject.toml for autopep8 & black. (#4543) * Add pyproject.toml for autopep8. * Add pyproject.toml for autopep8. Co-authored-by:
Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> * [Feature] Bump DLPack to v0.7 and decouple DLPack from the core library (#4454) * rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments Co-authored-by:
Chang Liu <chang.liu@utexas.edu> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Xin Yao <yaox12@outlook.com> Co-authored-by:
Israt Nisa <neesha295@gmail.com> Co-authored-by:
Israt Nisa <nisisrat@amazon.com> Co-authored-by:
peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by:
ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by:
Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by:
Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by:
rudongyu <ru_dongyu@outlook.com> Co-authored-by:
Quan Gan <coin2028@hotmail.com> Co-authored-by:
Vibhu Jawa <vibhujawa@gmail.com> * [Deprecation] Dataset Attributes (#4546) * Update * CI * CI * Update Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * [Example] Bug Fix (#4665) * Update * CI * CI * Update * Update Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> * Update Co-authored-by:
Chang Liu <chang.liu@utexas.edu> Co-authored-by:
nv-dlasalle <63612878+nv-dlasalle@users.noreply.github.com> Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Xin Yao <yaox12@outlook.com> Co-authored-by:
Israt Nisa <neesha295@gmail.com> Co-authored-by:
Israt Nisa <nisisrat@amazon.com> Co-authored-by:
peizhou001 <110809584+peizhou001@users.noreply.github.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-19-194.ap-northeast-1.compute.internal> Co-authored-by:
ndickson-nvidia <99772994+ndickson-nvidia@users.noreply.github.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com> Co-authored-by:
Rhett Ying <85214957+Rhett-Ying@users.noreply.github.com> Co-authored-by:
Hongzhi (Steve), Chen <chenhongzhi.nkcs@gmail.com> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal> Co-authored-by:
Ubuntu <ubuntu@ip-172-31-9-26.ap-northeast-1.compute.internal> Co-authored-by:
Zhiteng Li <55398076+ZHITENGLI@users.noreply.github.com> Co-authored-by:
rudongyu <ru_dongyu@outlook.com> Co-authored-by:
Quan Gan <coin2028@hotmail.com> Co-authored-by:
Vibhu Jawa <vibhujawa@gmail.com>
-
- 11 Oct, 2022 1 commit
-
-
Hongzhi (Steve), Chen authored
* Auto fix c++. * reformat Co-authored-by:Steve <ubuntu@ip-172-31-34-29.ap-northeast-1.compute.internal>
-
- 21 Sep, 2022 1 commit
-
-
Xin Yao authored
* disable warning for tensorpipe * fix warning * enable lint check for cuh files * resolve comments
-
- 19 Sep, 2022 1 commit
-
-
Xin Yao authored
* rename `DLContext` to `DGLContext` * rename `kDLGPU` to `kDLCUDA` * replace DLTensor with DGLArray * fix linting * Unify DGLType and DLDataType to DGLDataType * Fix FFI * rename DLDeviceType to DGLDeviceType * decouple dlpack from the core library * fix bug * fix lint * fix merge * fix build * address comments * rename dl_converter to dlpack_convert * remove redundant comments
-
- 15 Sep, 2022 1 commit
-
-
Xin Yao authored
* add set_stream * add .record_stream for NDArray and HeteroGraph * refactor dgl stream Python APIs * test record_stream * add unit test for record stream * use pytorch's stream * fix lint * fix cpu build * address comments * address comments * add record stream tests for dgl.graph * record frames and update dataloder * add docstring * update frame * add backend check for record_stream * remove CUDAThreadEntry::stream * record stream for newly created formats * fix bug * fix cpp test * fix None c_void_p to c_handle
-
- 06 Sep, 2022 1 commit
-
-
Chang Liu authored
* Use an internal cuda stream for CopyDataFromTo * small fix white space * Fix to compile * Make stream optional in copydata for compile * fix lint issue * Update cub functions to use internal stream * Lint check * Update CopyTo/CopyFrom/CopyFromTo to use internal stream * Address comments * Fix backward CUDA stream * Avoid overloading CopyFromTo() * Minor comment update * Overload copydatafromto in cuda device api Co-authored-by:xiny <xiny@nvidia.com>
-
- 12 Aug, 2022 1 commit
-
-
Xin Yao authored
* Change CUDA_MAX_NUM_THREADS to 256 * change the configuration of grid
-
- 09 Aug, 2022 1 commit
-
-
Xin Yao authored
-
- 29 Jul, 2022 1 commit
-
-
Xin Yao authored
* add weighted sampling without replacement (A-Chao) * improve Algorithm A-Chao with block-wise prefix sum * correctly fill out_idxs * implement weighted sampling with replacement * small fix * merge host-side code of weighted/uniform sampling * enable unit tests for cuda weighted sampling * move thrust/cub wrapper to the cmake file * update docs accordingly * fix linting * fix linting * fix unit test * Bump external CUB/Thrust versions * Fix code style and update description of algorithm design * [Feature] GPU support weighted graph neighbor sampling commit by pengqirong(OPPO) * merge pengqirong's implementation * revert the change to cub and thrust * fix linting * use DeviceSegmentedSort for better performance * add more comments * add necessary notes * add necessary notes * resolve some comments * define THRUST_CUB_WRAPPED_NAMESPACE * fix doc Co-authored-by:彭齐荣 <657017034@qq.com>
-
- 15 Jul, 2022 1 commit
-
-
Quan (Andy) Gan authored
-
- 27 Jun, 2022 1 commit
-
-
ndickson-nvidia authored
* * Added missing specializations for `__half` of `DLDataTypeTraits`, `IndexSelect`, `Full`, `Scatter_`, `CSRGetData`, `CSRMM`, `CSRSum`, `IndexSelectCPUFromGPU` * Fixed casting issue in `_LinearSearchKernel` that was preventing it from supporting `__half` * Added `#if`'d out specializations of `CSRGEMM`, `CSRGEAM`, and `Xgeam`, which would require functions that aren't currently provided by cublas * * Added more specific error messages for unimplemented FP16 specializations of Xgeam, CSRGEMM, and CSRGEAM * * Added missing instantiation of DLDataTypeTraits<__half>::dtype * * Fixed linter error * Added clearer comment explaining why the cast to long long is necessary * * Worked around a compile error in some particular setup, where __half can't be constructed on the host side * * Fixed linter formatting errors * * Changes to comments as recommended * * Made recommended changes to logging errors in FP16 specializations * Also changed the existing Xgeam function for unsupported data types from LOG(INFO) to LOG(FATAL)
-
- 24 Jun, 2022 1 commit
-
-
nv-dlasalle authored
* Add uva by default to embedding * More updates * Update optimizer * Add new uva functions * Expose new pinned memory function * Add unit tests * Update formatting * Fix unit test * Handle auto UVA case when training is on CPU * Allow per-embedding decisions for whether to use UVA * Address spares_optim.py comments * Remove unused templates * Update unit test * Use dgl allocate memory for pinning * allow automatically unpin * workaround for d2h copy with a different dtype * fix linting * update error message * update copyright Co-authored-by:
Xin Yao <xiny@nvidia.com> Co-authored-by:
Minjie Wang <wmjlyjemaine@gmail.com>
-