- 23 Apr, 2025 1 commit
-
-
Alisehen authored
-
- 22 Apr, 2025 3 commits
-
-
Alisehen authored
-
wang jiahao authored
Update param
-
qiyuxinlin authored
-
- 21 Apr, 2025 1 commit
-
-
qiyuxinlin authored
-
- 19 Apr, 2025 2 commits
-
-
wang jiahao authored
Update Function call
-
Creeper-MZ authored
优化提示词,解决部分Deepseek r1的兼容性 fix non stream
-
- 18 Apr, 2025 9 commits
-
-
Atream authored
Fix cmake config error
-
wang jiahao authored
Move KV cache creation to balance_serve
-
qiyuxinlin authored
-
mykg authored
Signed-off-by:onepick <jiajuku12@163.com>
-
Atream authored
Enh: Make Ollama perf data more accurate, consistent with OpenAI's implementation
-
Atream authored
remove hard code max_length
-
Atream authored
-
Jianwei Dong authored
update llama4 tutorial
-
djw authored
-
- 17 Apr, 2025 11 commits
-
-
Creeper-MZ authored
-
Yuhao Tsui authored
-
Creeper-MZ authored
-
ZiWei Yuan authored
Fix some build error for ROCM
-
mykg authored
Signed-off-by:onepick <jiajuku12@163.com>
-
Yuhao Tsui authored
Modify the performance data calculation module from estimation to retrieving from `raw_usage`.
-
wang jiahao authored
Feat: Support Non-streaming chat in Ollama backend
-
wang jiahao authored
Fix the error caused by the client not passing temperature and top_p being empty
-
mykg authored
1. Fix terrible logic in CMakeLists.txt 2. using the correct typedef for hip Signed-off-by:onepick <jiajuku12@163.com>
-
Creeper-MZ authored
-
wang jiahao authored
Add bsz_tensors param to torch linear
-
- 16 Apr, 2025 6 commits
-
-
Creeper-MZ authored
-
Creeper-MZ authored
Update chat.py Update chat.py Update chat.py
-
root authored
-
kevin authored
Update config.py
-
kevin authored
-
Chengyu Qiu authored
Feat: Add Function call support
-
- 15 Apr, 2025 2 commits
-
-
ZiWei Yuan authored
feat(build): display limited tail of subprocesses in real time
-
jizhilong authored
this is a followup on #1108
-
- 14 Apr, 2025 3 commits
-
-
ZiWei Yuan authored
chore: show cmake output in real time during build_ext
-
sean.su authored
Defined new data structures in chat.py to replace OpenAI's original implementation, adding support for tool calling. Implemented logic for extracting and processing tool calls, enabling dynamic function invocation during conversations. Added methods in balance_serve.py to retrieve sampling parameters, handling default values and edge cases. Updated ktransformers.py and transformers.py to support the passing of tool parameters. Modified the default value of top_p in config.py to 1.0 to increase generation diversity. Extended the message model in chat.py to support the transmission of tool call information. These changes enhance the system's flexibility and functionality, enabling more complex interaction patterns.
-
Creeper-MZ authored
-
- 13 Apr, 2025 2 commits
-
-
wang jiahao authored
使用长prompt时,避免rpc进程挂掉
-
wangkuigang-yewu-cmss authored
当prompt超过cache_len的时候,rpc进程会crash掉,导致整体不可用。 这里增加一个检查,让过长的prompt在请求早期就被提前过滤掉
-