bench_other.py 10.2 KB
Newer Older
Lianmin Zheng's avatar
Lianmin Zheng committed
1
2
3
import argparse
import json
import time
Liangsheng Yin's avatar
Liangsheng Yin committed
4
from concurrent.futures import ThreadPoolExecutor
Lianmin Zheng's avatar
Lianmin Zheng committed
5
6

from tqdm import tqdm
Liangsheng Yin's avatar
Liangsheng Yin committed
7

Liangsheng Yin's avatar
Liangsheng Yin committed
8
from sglang.test.test_utils import add_common_other_args_and_parse, get_call_generate
Liangsheng Yin's avatar
Liangsheng Yin committed
9
from sglang.utils import dump_state_text, read_jsonl
Lianmin Zheng's avatar
Lianmin Zheng committed
10
11
12
13


def get_prompt(question):
    prompt = (
14
        """Solve a question answering task with interleaving Thought, Action, Observation steps. Thought can reason about the current situation, and Action can be three types:
Lianmin Zheng's avatar
Lianmin Zheng committed
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
(1) Search[entity], which searches the exact entity on Wikipedia and returns the first paragraph if it exists. If not, it will return some similar entities to search.
(2) Lookup[keyword], which returns the next sentence containing keyword in the current passage.
(3) Finish[answer], which returns the answer and finishes the task.
Here are some examples.
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of the area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions:
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130 m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]
Question: Musician and satirist Allie Goertz wrote a song about the "The Simpsons" character Milhouse, who Matt Groening named after who?
Thought 1: The question simplifies to "The Simpsons" character Milhouse is named after who. I only need to search Milhouse and find who it is named after.
Action 1: Search[Milhouse]
Observation 1: Milhouse Mussolini Van Houten is a recurring character in the Fox animated television series The Simpsons voiced by Pamela Hayden and created by Matt Groening.
Thought 2: The paragraph does not tell who Milhouse is named after, maybe I can look up "named after".
Action 2: Lookup[named after]
40
Observation 2: (Result 1 / 1) Milhouse was named after U.S. president Richard Nixon, whose middle name was Milhous.
Lianmin Zheng's avatar
Lianmin Zheng committed
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Thought 3: Milhouse was named after U.S. president Richard Nixon, so the answer is Richard Nixon.
Action 3: Finish[Richard Nixon]
Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The Saimaa Gesture?
Thought 1: I need to search Adam Clayton Powell and The Saimaa Gesture, and find which documentary is about Finnish rock groups.
Action 1: Search[Adam Clayton Powell]
Observation 1: Could not find [Adam Clayton Powell]. Similar: ['Adam Clayton Powell III', 'Seventh Avenue (Manhattan)', 'Adam Clayton Powell Jr. State Office Building', 'Isabel Washington Powell', 'Adam Powell', 'Adam Clayton Powell (film)', 'Giancarlo Esposito'].
Thought 2: To find the documentary, I can search Adam Clayton Powell (film).
Action 2: Search[Adam Clayton Powell (film)]
Observation 2: Adam Clayton Powell is a 1989 American documentary film directed by Richard Kilberg.
The film is about the rise and fall of influential African-American politician Adam Clayton Powell Jr.[3][4] It was later aired as part of the PBS series The American Experience.
Thought 3: Adam Clayton Powell (film) is a documentary about an African-American politician, not Finnish rock groups. So the documentary about Finnish rock groups must instead be The Saimaa Gesture.
Action 3: Finish[The Saimaa Gesture]
Question: What profession does Nicholas Ray and Elia Kazan have in common?
Thought 1: I need to search Nicholas Ray and Elia Kazan, find their professions, then find the profession they have in common.
Action 1: Search[Nicholas Ray]
Observation 1: Nicholas Ray (born Raymond Nicholas Kienzle Jr., August 7, 1911 – June 16, 1979) was an American film director, screenwriter, and actor best known for the 1955 film Rebel Without a Cause.
Thought 2: Professions of Nicholas Ray are director, screenwriter, and actor. I need to search Elia Kazan next and find his professions.
Action 2: Search[Elia Kazan]
Observation 2: Elia Kazan was an American film and theatre director, producer, screenwriter and actor.
Thought 3: Professions of Elia Kazan are director, producer, screenwriter, and actor. So profession Nicholas Ray and Elia Kazan have in common is director, screenwriter, and actor.
Action 3: Finish[director, screenwriter, actor]
Question: Which magazine was started first Arthur's Magazine or First for Women?
Thought 1: I need to search Arthur's Magazine and First for Women, and find which was started first.
Action 1: Search[Arthur's Magazine]
65
Observation 1: Arthur's Magazine (1844-1846) was an American literary periodical published in Philadelphia in the 19th century.
Lianmin Zheng's avatar
Lianmin Zheng committed
66
67
Thought 2: Arthur's Magazine was started in 1844. I need to search First for Women next.
Action 2: Search[First for Women]
68
Observation 2: First for Women is a woman's magazine published by Bauer Media Group in the USA.[1] The magazine was started in 1989.
Lianmin Zheng's avatar
Lianmin Zheng committed
69
70
71
72
73
74
75
76
Thought 3: First for Women was started in 1989. 1844 (Arthur's Magazine) < 1989 (First for Women), so Arthur's Magazine was started first.
Action 3: Finish[Arthur's Magazine]
Question: Were Pavel Urysohn and Leonid Levin known for the same type of work?
Thought 1: I need to search Pavel Urysohn and Leonid Levin, find their types of work, then find if they are the same.
Action 1: Search[Pavel Urysohn]
Observation 1: Pavel Samuilovich Urysohn (February 3, 1898 â August 17, 1924) was a Soviet mathematician who is best known for his contributions in dimension theory.
Thought 2: Pavel Urysohn is a mathematician. I need to search Leonid Levin next and find its type of work.
Action 2: Search[Leonid Levin]
77
78
Observation 2: Leonid Anatolievich Levin is a Soviet-American mathematician and computer scientist.
Thought 3: Leonid Levin is a mathematician and computer scientist. So Pavel Urysohn and Leonid Levin have the same type of work.
Lianmin Zheng's avatar
Lianmin Zheng committed
79
Action 3: Finish[yes]
Liangsheng Yin's avatar
Liangsheng Yin committed
80
81
82
"""
        + question
    )
Lianmin Zheng's avatar
Lianmin Zheng committed
83
84
85
86
    return prompt


def main(args):
Liangsheng Yin's avatar
Liangsheng Yin committed
87
88
    lines = read_jsonl(args.data_path)[: args.num_questions]
    arguments = [{"question": k, "triplets": v} for l in lines for k, v in l.items()]
Lianmin Zheng's avatar
Lianmin Zheng committed
89
90
91
92

    states = []

    # Select backend
Liangsheng Yin's avatar
Liangsheng Yin committed
93
    call_generate = get_call_generate(args)
Lianmin Zheng's avatar
Lianmin Zheng committed
94
95
96
97
98
99
100
101

    def run_single_agent(argument):
        question = argument["question"]
        triplets = argument["triplets"]
        prompt = get_prompt(question)
        for i in range(1, len(triplets) + 2):
            prompt += "Thought " + str(i) + ":"
            states.append(prompt)
Liangsheng Yin's avatar
Liangsheng Yin committed
102
103
104
            answer = call_generate(
                prompt, max_tokens=200, temperature=0, stop="Observation"
            )
Lianmin Zheng's avatar
Lianmin Zheng committed
105
106
            if i > len(triplets):
                break
Liangsheng Yin's avatar
Liangsheng Yin committed
107
108
109
110
111
112
113
114
115
116
117
118
            prompt += (
                triplets[i - 1]["thought"]
                + "\nAction "
                + str(i)
                + ":"
                + triplets[i - 1]["action"]
                + "\nObservation "
                + str(i)
                + ":"
                + triplets[i - 1]["observation"]
                + "\n"
            )
Lianmin Zheng's avatar
Lianmin Zheng committed
119
120
121

            states.append(answer)

Liangsheng Yin's avatar
Liangsheng Yin committed
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
    async def run_single_agent_async(argument):
        question = argument["question"]
        triplets = argument["triplets"]
        prompt = get_prompt(question)
        for i in range(1, len(triplets) + 2):
            prompt += "Thought " + str(i) + ":"
            states.append(prompt)
            answer = await call_generate(
                prompt, max_tokens=200, temperature=0, stop="Observation", max_len=4096
            )
            if i > len(triplets):
                break
            prompt += (
                triplets[i - 1]["thought"]
                + "\nAction "
                + str(i)
                + ":"
                + triplets[i - 1]["action"]
                + "\nObservation "
                + str(i)
                + ":"
                + triplets[i - 1]["observation"]
                + "\n"
            )

            states.append(answer)

Lianmin Zheng's avatar
Lianmin Zheng committed
149
    tic = time.time()
Liangsheng Yin's avatar
Liangsheng Yin committed
150
151
152
153
154
155
156
157
158
159
160
161
162

    if args.backend != "lmql":
        if args.parallel == 1:
            for arg in tqdm(arguments):
                run_single_agent(arg)
        else:
            with ThreadPoolExecutor(args.parallel) as executor:
                list(
                    tqdm(
                        executor.map(run_single_agent, arguments), total=len(arguments)
                    )
                )

Lianmin Zheng's avatar
Lianmin Zheng committed
163
    else:
Liangsheng Yin's avatar
Liangsheng Yin committed
164
165
166
167
168
169
170
171
172
173
174
175
        import asyncio

        loop = asyncio.get_event_loop()
        batches = [
            [] for _ in range((len(arguments) + args.parallel - 1) // args.parallel)
        ]
        for i, arg in enumerate(arguments):
            batches[i // args.parallel].append(arg)
        for bt in tqdm(batches):
            tasks = [run_single_agent_async(arg) for arg in bt]
            loop.run_until_complete(asyncio.gather(*tasks))

Lianmin Zheng's avatar
Lianmin Zheng committed
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
    latency = time.time() - tic

    print(f"Latency: {latency:.3f}")

    # Write results
    dump_state_text(f"tmp_output_{args.backend}.txt", states)

    with open(args.result_file, "a") as fout:
        value = {
            "task": "ReAct Agents",
            "backend": args.backend,
            "num_gpus": 1,
            "latency": round(latency, 3),
            "num_requests": len(arguments),
            "other": {
                "parallel": args.parallel,
            },
        }
        fout.write(json.dumps(value) + "\n")


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--data-path", type=str, default="hotpotqa_100.jsonl")
    parser.add_argument("--num-questions", type=int, default=10)
    args = add_common_other_args_and_parse(parser)
    main(args)