- 13 Mar, 2026 2 commits
-
-
thatPepe authored
- 12 Mar, 2026 1 commit
-
-
PanZezhong authored
-
- 11 Mar, 2026 3 commits
-
-
PanZezhong authored
-
wooway777 authored
-
- 09 Mar, 2026 2 commits
-
-
PanZezhong authored
-
PanZezhong1725 authored
-
- 06 Mar, 2026 1 commit
-
-
PanZezhong authored
-
- 11 Feb, 2026 2 commits
-
-
zhushuang authored
-
qinyiqun authored
demo131 - multiple issues regarding quantization, qy, and so forth * issue/843: success per_channel_quant_int8 * issue/843: success qy quant * issue/843: modified quant * Add w8a8int8 performance tests * add infinicore op linear_w8a8i8 * w8a8 linear module functional nn * issue/843: QY-GPU Support Int8 scale_mm (#68) * issue/843: success qy scaled_mm * issue/843: modified kernel.cuh as per_channel_dequant_int8.cuh * fix parallel slic in w8 * w8: support multiple batch size * temp: 修改quantconfig处理 * fix format and delete redundancy code * fix format * fix format * fix format * Refactor: add new API alongside legacy interfaces with deprecation warnings * 添加w4 inifnicore相关内容,以及将Quantization config划入InfiniCore * 量化算子支持图 * solve cub version problem and fix code structure * fix format * demo131 - remove commented lines --------- Co-authored-by:
xgqdut2016 <kenan_gewei@163.com> Co-authored-by:
xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com> Co-authored-by:
wooway777 <wooway777@gmail.com>
-
- 27 Jan, 2026 3 commits
-
-
wooway777 authored
-
wooway777 authored
-
gongchensu authored
- Ensure embedding tensors are on the same device. Change format. - Optimize embedding kernel with vectorized memory access and __ldg - Add vectorized memory access using float4/float2, half2, and bfloat162 - Use __ldg instruction for read-only weight and indices access - Add memory alignment checks to enable vectorized paths - Add __restrict__ keywords for better compiler optimization - Implement dynamic block size selection based on embedding_dim
-
- 30 Dec, 2025 1 commit
-
-
zhushuang authored
-
- 29 Dec, 2025 1 commit
-
-
zhushuang authored
-
- 24 Dec, 2025 1 commit
-
-
zhuyue authored
-
- 21 Nov, 2025 1 commit
-
-
qinyiqun authored
* ISSUE/628 适配QY C610 GPU,增加编译选项,适配已有算子。添加bge类模型所需的算子,包括gelu,layer_norm,lp_norm(支持l1,l2 norm),relu,softmax,tanh。 --------- Co-authored-by:
xgqdut2016 <kenan_gewei@163.com> Co-authored-by:
xgqdut2016 <140036308+xgqdut2016@users.noreply.github.com>
-
- 28 Oct, 2025 1 commit
-
-
tianyuxbear authored
-
- 23 Oct, 2025 1 commit
-
-
pengcheng888 authored
Co-authored-by:pengcheng888 <pengcheng@example.com>
-
- 16 Oct, 2025 1 commit
-
-
gongchensu authored
Co-authored-by:
wawahejun <hejunlbbc@gmail.com> Co-authored-by:
zhuyue <zhuyue@qiyuanlab.com>
-
- 29 Sep, 2025 1 commit
-
-
pengcheng888 authored
-
- 23 Sep, 2025 1 commit
-
-
zhushuang authored
-
- 16 Sep, 2025 1 commit
-
-
Ziminli authored
-
- 10 Sep, 2025 1 commit
-
-
PanZezhong1725 authored
-
- 02 Sep, 2025 1 commit
-
-
blkmjsian authored
- dequantize awq - rope v2
-
- 07 Jul, 2025 1 commit
-
-
PanZezhong authored
-
- 27 Jun, 2025 1 commit
-
-
Pepe authored
issue/205 - 添加Sub算子的头文件、CPU实现、cuda实现、及Python测试
-
- 06 May, 2025 1 commit
-
-
Catheriany authored
-
- 28 Apr, 2025 1 commit
-
-
goldenfox2025 authored
-
- 25 Apr, 2025 2 commits
-
-
Graylatzhou authored
-
Graylatzhou authored
-
- 08 Apr, 2025 1 commit
-
-
PanZezhong authored
-
- 21 Mar, 2025 1 commit
-
-
PanZezhong authored
-
- 05 Mar, 2025 2 commits
-
-
YdrMaster authored
Signed-off-by:YdrMaster <ydrml@hotmail.com>
-
PanZezhong authored
-
- 21 Feb, 2025 1 commit
-
-
PanZezhong authored
-
- 11 Feb, 2025 1 commit
-
-
PanZezhongQY authored
-