Commit fb39e61b authored by zhuwenwen's avatar zhuwenwen
Browse files

Merge branch 'v0.9.2-dev_bugfix' into 'v0.9.2-dev'

bugfix: Fix the startup crash issue of the service when USE_FUSED_RMS_QUANT=1

See merge request dcutoolkit/deeplearing/vllm!344
parents 155c8a13 e6d32c6d
...@@ -1832,9 +1832,9 @@ def moe_forward(hidden_states: torch.Tensor, router_logits: torch.Tensor, ...@@ -1832,9 +1832,9 @@ def moe_forward(hidden_states: torch.Tensor, router_logits: torch.Tensor,
self = forward_context.no_compile_layers[layer_name] self = forward_context.no_compile_layers[layer_name]
assert self.quant_method is not None assert self.quant_method is not None
if envs.USE_FUSED_RMS_QUANT: if envs.USE_FUSED_RMS_QUANT:
return self.forward_impl(hidden_states, router_logits, shared_output, i_q, i_s) return self.forward_impl(hidden_states, router_logits, shared_output = shared_output, i_q = i_q, i_s = i_s)
else: else:
return self.forward_impl(hidden_states, router_logits, shared_output) return self.forward_impl(hidden_states, router_logits, shared_output = shared_output)
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment