[Feature] modify Runtime to support skip_tokenizer_init (#1088)

Co-authored-by: lzhang <zhanglei@modelbest.cn>

[Feature] modify Runtime to support skip_tokenizer_init (#1088)
Co-authored-by: lzhang <zhanglei@modelbest.cn>
616b59f3 · rainred · GitHub · c8423ca3 · 616b59f3
Unverified Commit 616b59f3 authored Aug 14, 2024 by rainred Committed by GitHub Aug 14, 2024
Show whitespace changes
Inline Side-by-side

Showing with 19 additions and 9 deletions

python/sglang/srt/server.py python/sglang/srt/server.py +19 -9

No files found.
--- a/python/sglang/srt/server.py
+++ b/python/sglang/srt/server.py
@@ -533,6 +533,13 @@ class Runtime:
        prompt: str,
        sampling_params: Optional[Dict] = None,
    ):
+        if self.server_args.skip_tokenizer_init:
+            json_data = {
+                "input_ids": prompt,
+                "sampling_params": sampling_params,
+                "stream": True,
+            }
+        else:
            json_data = {
                "text": prompt,
                "sampling_params": sampling_params,
@@ -549,10 +556,13 @@ class Runtime:
                        if chunk == "data: [DONE]\n\n":
                            break
                        data = json.loads(chunk[5:].strip("\n"))
+                        if hasattr(data, "text"):
                            cur = data["text"][pos:]
                            if cur:
                                yield cur
                            pos += len(cur)
+                        else:
+                            yield data
    add_request = async_generate