README_deploy.md 5.12 KB
Newer Older
mashun1's avatar
uitars  
mashun1 committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# UI-TARS 1.5 HuggingFace Endpoint Deployment Guide

## 1. HuggingFace Inference Endpoints Cloud Deployment

We use HuggingFace's Inference Endpoints platform to quickly deploy a cloud-based model.

### Deployment Steps

1. **Access the Deployment Interface**  
    - Click [Deploy from Hugging Face](https://endpoints.huggingface.co/catalog)  
    ![Deploy from Hugging Face](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_1_formal.png?download=true)  
    - Select the model `UI-TARS 1.5 7B` and click **Import Model**  
    ![Import Model](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_2_formal.png?download=true)  

2. **Configure Settings**
    - **Hardware Configuration**  
        - In the `Hardware Configuration` section, choose a GPU instance. Here are the recommendations for different model sizes:  
            - For the 7B model, select `GPU L40S 1GPU 48G` (Recommended: Nvidia L4 / Nvidia A100).  
        ![Hardware Configuration](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_3_formal.png?download=true)

    - **Container Configuration**  
        - Set the following parameters:  
            - `Max Number of Tokens (per Query)`: 65536  
            - `Max Batch Prefill Tokens`: 65536  
            - `Max Input Length (per Query)`: 65537  
        ![Container Configuration](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_4_formal.png?download=true)

    - **Environment Variables**  
        - Add the following environment variables:  
            - `CUDA_GRAPHS=0` to avoid deployment failures. For details, refer to [issue 2875](https://github.com/huggingface/text-generation-inference/issues/2875).  
            - `PAYLOAD_LIMIT=8000000` to prevent request failures due to large images. For details, refer to [issue 1802](https://github.com/huggingface/text-generation-inference/issues/1802).  
        ![Environment Variables](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_5_formal.png?download=true)

    - **Create Endpoint**  
        - Click **Create** to set up the endpoint.  
        ![Create Endpoint](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_6_formal.png?download=true)

    - **Enter Setup**  
        - Once the deployment is finished, you will see the confirmation page and need to enter the settings page.  
        ![Complete](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_7_formal.png?download=true)
    
    - **Update URI** -
        - Go to the Container page, set the Container URI to ghcr.io/huggingface/text-generation-inference:3.2.1, and **click Update Endpoint to apply the changes**. 
        ![Complete](https://huggingface.co/datasets/JjjFangg/Demo_video/resolve/main/deployment_8_formal.png?download=true)


## 2. API Usage Example

### **Python Test Code**  
```python
# pip install openai
import io
import re
import json
import base64
from PIL import Image
from io import BytesIO
from openai import OpenAI

def add_box_token(input_string):
    # Step 1: Split the string into individual actions
    if "Action: " in input_string and "start_box=" in input_string:
        suffix = input_string.split("Action: ")[0] + "Action: "
        actions = input_string.split("Action: ")[1:]
        processed_actions = []
        for action in actions:
            action = action.strip()
            # Step 2: Extract coordinates (start_box or end_box) using regex
            coordinates = re.findall(r"(start_box|end_box)='\((\d+),\s*(\d+)\)'", action)
            
            updated_action = action  # Start with the original action
            for coord_type, x, y in coordinates:
                # Convert x and y to integers
                updated_action = updated_action.replace(f"{coord_type}='({x},{y})'", f"{coord_type}='<|box_start|>({x},{y})<|box_end|>'")
            processed_actions.append(updated_action)
        
        # Step 5: Reconstruct the final string
        final_string = suffix + "\n\n".join(processed_actions)
    else:
        final_string = input_string
    return final_string

client = OpenAI(
    base_url="https:xxx",
    api_key="hf_xxx"
)

result = {}
messages = json.load(open("./data/test_messages.json"))
for message in messages:
    if message["role"] == "assistant":
        message["content"] = add_box_token(message["content"])
        print(message["content"])

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=messages,
    top_p=None,
    temperature=0.0,
    max_tokens=400,
    stream=True,
    seed=None,
    stop=None,
    frequency_penalty=None,
    presence_penalty=None
)

response = ""
for message in chat_completion:
    response += message.choices[0].delta.content
print(response)
```

### **Expected Output** ###
```python
Thought: 我看到Preferences窗口已经打开了但这里显示的都是系统资源相关的设置要设置图片的颜色模式我得先看看左侧的选项列表"Color Management"这个选项看起来很有希望应该就是处理颜色管理的地方让我点击它看看里面有什么选项
Action: click(start_box='(177,549)')
```