Commits · a4b49551a93eb72c7bb19f62f69ff55e253f62b1 · OpenDAS / DeepEP

23 May, 2026 1 commit
- change whl name for shca hca. · a4b49551
  lijian authored May 23, 2026
```
Signed-off-by: lijian <34831075+lijian0711@users.noreply.github.com>
```
  a4b49551
13 May, 2026 1 commit
- Add python version for whl package. · e3b8804b
  lijian authored May 13, 2026
```
Signed-off-by: lijian <34831075+lijian0711@users.noreply.github.com>
```
  e3b8804b
23 Mar, 2026 1 commit
- Add message for das release1.8 and update rocshmem for xdp. · ea76f44e
  lijian6 authored Mar 23, 2026
```
Signed-off-by: lijian <lijian6@sugon.com>
```
  ea76f44e
23 Jan, 2026 1 commit
- 接入ROCSHMEM的multiqp优化 · a1382ed7
  lishen authored Jan 23, 2026
  
  a1382ed7
15 Jan, 2026 1 commit
- modify package name and update rocshmem · e307a4e2
  lijian6 authored Jan 15, 2026
```
Signed-off-by: lijian <lijian6@sugon.com>
```
  e307a4e2
11 Dec, 2025 1 commit

Add flag for nvshmem/rocshmem on release name. · 26298255

lijian6 authored Dec 11, 2025



End of hash, 'a' means rocshmem, 'b' means nvshmem.
Signed-off-by: lijian <lijian6@sugon.com>

26298255

24 Oct, 2025 1 commit
- 1. 修复使用函数获取num_nvl_bytes, num_rdma_bytes变量的bug. · 0b14d3b2
  lijian6 authored Oct 24, 2025
```
2. 修改测试脚本，降低显存占用。使用量从17G -> 8G.
Signed-off-by: lijian <lijian6@sugon.com>
```
  0b14d3b2
17 Oct, 2025 1 commit
- Fitter for DCU. · 5563b6d0
  lijian6 authored Oct 17, 2025
```
Signed-off-by: lijian <lijian6@sugon.com>
```
  5563b6d0
24 Sep, 2025 1 commit
- Make dtype of topk_idx configurable (#422) · da6ca24e
  Tailing Yuan authored Sep 24, 2025
```
Co-authored-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
```
  da6ca24e
10 Sep, 2025 2 commits
- Update version number · 3e2c5d80
  Chenggang Zhao authored Sep 10, 2025
  
  3e2c5d80
- Update version number · ee5bd170
  Chenggang Zhao authored Sep 10, 2025
  
  ee5bd170
05 Aug, 2025 1 commit
- build(setuptools): fix nvshmem dynamic library name searching in Python 3.9 (#351) · 26cf250a
  windreamer authored Aug 05, 2025
  
  26cf250a
31 Jul, 2025 1 commit
- Fix SM80 compilation · be8053d6
  Chenggang Zhao authored Jul 31, 2025
  
  be8053d6
15 Jul, 2025 3 commits
- setup.py: Remove nvcc_dlink specific gencode · 35e1cd1b
  Seth Howell authored Jul 14, 2025
```
Responding to review comments.
Signed-off-by: Seth Howell <sethh@nvidia.com>
```
  35e1cd1b
- setup.py: Add logic for detecting library locations from NVSHMEM wheels. · 2a873392
  Seth Howell authored Jul 14, 2025
```
Signed-off-by: Seth Howell <sethh@nvidia.com>
```
  2a873392
- setup.py: Clean up some extra prints. · 903711c6
  Seth Howell authored Jul 14, 2025
```
Signed-off-by: Seth Howell <sethh@nvidia.com>
```
  903711c6
12 Jul, 2025 1 commit

third-party: Update tests to use upstream NVSHMEM · 441833d3

Seth Howell authored Jul 11, 2025



NVSHMEM 3.3 and above support the host-side features
in the patch.

Note: Removed recv queue support
Signed-off-by: Seth Howell <sethh@nvidia.com>

441833d3

04 Jul, 2025 1 commit

Use TMA to optimize internode dispatch. (#276) · a2fa3b73

Shangyan Zhou authored Jul 04, 2025



* Add TMA buffer allocation

* Use TMA for forwarders and NVL receivers

* Use lane 31 to operate TMA.

* Change rdma buffer layout.

* Use TMA to transfer scales also.

* Increase the NVL recv buffer size.

* Disable early stopping.

* Apply similar optimizations on receiver warps.

* Prevent warp divergence.

* Disable aggressive ptx by default.

* Revert using TMA to transfer scales.

* Format.

* Change the layout of dispatch NVL buffer.

* Move topk transformation to recv warps.

* Use TMA to transfer all data in foward warps

* Use TMA to store scales.

* Code lint

---------
Co-authored-by: Chenggang Zhao <chenggangz@deepseek.com>

a2fa3b73

27 Jun, 2025 1 commit
- Stricter conditions for aggressive PTX instructions · 004d6f9b
  Chenggang Zhao authored Jun 27, 2025
  
  004d6f9b
11 Jun, 2025 1 commit

Support Ampere architecture (#204) · b8d90fb7

Chenggang Zhao authored Jun 11, 2025

* Update README

* Update `setup.py`

* Fix headers

* Add `DISABLE_NVSHMEM` for APIs

* Fix launch

* Fix TMA settings

* Fix TMA usages

* Fix dlink

* Separate layout kernels

* Update version

* Add `is_sm90_compiled`

* Fix tests

* Add NVLink connection checks

* Update README

* Fix tests

* Add some comments

* Minor fix

* Minor fix

* Fix bugs

b8d90fb7

19 May, 2025 1 commit

Make `TORCH_CUDA_ARCH_LIST` as an environment variable (#167) · d5ca4495

guyueh1 authored May 18, 2025



* Add 10.0 to TORCH_CUDA_ARCH_LIST
Signed-off-by: Guyue Huang <guyueh@nvidia.com>

* Revert csrc/CMakeLists change; in setup.py make TORCH_CUDA_ARCH_LIST configurable
Signed-off-by: Guyue Huang <guyueh@nvidia.com>

---------
Signed-off-by: Guyue Huang <guyueh@nvidia.com>

d5ca4495

25 Feb, 2025 1 commit
- Initial commit · ebfe47e4
  Chenggang Zhao authored Feb 24, 2025
  
  ebfe47e4