Commits · f2e624d65122db7433bf4fbc74ca1906b7f571b4 · OpenDAS / vllm_cscc

11 Aug, 2025 1 commit
- 已修改 vllm/utils/__init__.py · f2e624d6
  xiabo authored Aug 11, 2025
  
  f2e624d6
10 Aug, 2025 2 commits
- 更改默认的full _cuda_graph启动方式为false · 3dad13fb
  gaoqiong authored Aug 10, 2025
  
  3dad13fb
- [fix]修复v1 mtp接受率低的问题 · 04b61f0e
  王敏 authored Aug 10, 2025
  
  04b61f0e
09 Aug, 2025 1 commit
- Revert "已修改 vllm/utils/__init__.py" · 9ff3592b
  zhuwenwen authored Aug 09, 2025
```
This reverts commit 20b6cf64.
```
  9ff3592b
08 Aug, 2025 2 commits
- update fa full_cuda_graph support · 513f17a4
  zhuwenwen authored Aug 08, 2025
  
  513f17a4
- feat:新增VLLM_USE_GLOBAL_CACHE13 设置moe使用全局变量的cache13 · 333104ab
  jujl1 authored Jul 31, 2025
  
  333104ab
07 Aug, 2025 6 commits
- 修改增加SlimQuantW4A8Int8MoEMethod 获取intermediate_size_per_partition 支持 · e92bb9ea
  gaoqiong authored Aug 07, 2025
  
  e92bb9ea
- 修改增加lmslimquant_w4a8量化支持 · 8b1e4ef0
  gaoqiong authored Aug 07, 2025
  
  8b1e4ef0
- [feat]支持mtp模型full_cuda_graph · bd58c289
  王敏 authored Aug 07, 2025
  
  bd58c289
- [feat]支持mtp模型full_cuda_graph · 89eecc55
  王敏 authored Aug 07, 2025
  
  89eecc55
- [feat]支持mtp模型full_cuda_graph · a1239b53
  王敏 authored Aug 07, 2025
  
  a1239b53
- 已修改 vllm/utils/__init__.py · 20b6cf64
  xiabo authored Aug 07, 2025
  
  20b6cf64
06 Aug, 2025 7 commits
- update VLLM_FLASH_ATTN_V1 to VLLM_USE_FLASH_ATTN_PA · 88dbf92c
  zhuwenwen authored Aug 06, 2025
  
  88dbf92c
- update benchmark_throughput.py · fe657b8b
  zhuwenwen authored Aug 06, 2025
  
  fe657b8b
- update warmup_sampling_params · 966ebb2b
  zhuwenwen authored Aug 06, 2025
  
  966ebb2b
- [feat]支持mtp模型full_cuda_graph · 9dd945c1
  王敏 authored Aug 06, 2025
  
  9dd945c1
- Revert "Merge remote-tracking branch 'origin/v0.9.2-dev-wm' into v0.9.2-dev" · 0c1cd0f5
  zhuwenwen authored Aug 06, 2025
```
This reverts merge request !169
```
  0c1cd0f5
- update lmslim import · 0d4ff65d
  zhuwenwen authored Aug 06, 2025
  
  0d4ff65d
- Revert "update lmslim import" · 3ae8665d
  zhuwenwen authored Aug 06, 2025
```
This reverts commit 1d575d52.
```
  3ae8665d
05 Aug, 2025 7 commits
- [feat]优化mtp相关函数返回类型 · 7e71c143
  王敏 authored Aug 05, 2025
  
  7e71c143
- merge and debug tbo on 0.9.2 · 3f8b2afe
  lizhigong authored Aug 05, 2025
  
  3f8b2afe
- [feat]1.支持mtp模型 full_cuda_graph; 2.优化mtp拒绝采样 · 8e0ae19d
  王敏 authored Aug 05, 2025
  
  8e0ae19d
- update lmslim import · 1d575d52
  zhuwenwen authored Aug 05, 2025
  
  1d575d52
- add glm4.5 k100-ai config · d160ae26
  zhuwenwen authored Aug 05, 2025
  
  d160ae26
- add step3-vl k100-ai config · 3e1ed13b
  zhuwenwen authored Aug 05, 2025
  
  3e1ed13b
- when using VLLM_FLASH_ATTN_V1, set block_size to 64 · 80a682c7
  zhuwenwen authored Aug 05, 2025
  
  80a682c7
04 Aug, 2025 4 commits
- add step3-vl config · 8e1c204b
  zhuwenwen authored Aug 04, 2025
  
  8e1c204b
- add step3-vl tuning · 2d364c4e
  zhuwenwen authored Aug 04, 2025
  
  2d364c4e
- add tbo on v1 engine · 20e75ed6
  lizhigong authored Aug 02, 2025
  
  20e75ed6
- update conv layout · eba84521
  zhuwenwen authored Aug 04, 2025
  
  eba84521
02 Aug, 2025 1 commit
- add glm4.5 config · 94b06a94
  zhuwenwen authored Aug 02, 2025
  
  94b06a94
01 Aug, 2025 9 commits
- set default block_size to 16 · 80045bf7
  zhuwenwen authored Aug 01, 2025
  
  80045bf7
- update N to N1 · 8c7075d1
  zhuwenwen authored Aug 01, 2025
  
  8c7075d1
- 增加w4a8相关支持修改 · 2767fc34
  gaoqiong authored Aug 01, 2025
  
  2767fc34
- back to default conv layout · 5f18e876
  zhuwenwen authored Aug 01, 2025
  
  5f18e876
- update rocm.py · 0480314d
  zhuwenwen authored Aug 01, 2025
  
  0480314d
- [Model] Update step3 vl · 66540380
  zhuwenwen authored Aug 01, 2025
  
  66540380
- [Model] Add step3 vl · 53ffe40e
  zhuwenwen authored Aug 01, 2025
  
  53ffe40e
- [fix]避免mla中cudagraph的适配影响非并行解码的逻辑 · 0e5d399a
  王敏 authored Aug 01, 2025
  
  0e5d399a
- update HIP_VISIBLE_DEVICES of rocm · d0cc5577
  zhuwenwen authored Aug 01, 2025
  
  d0cc5577