README_zh.md

# 使用 SGLang + GLM-4.5 模型搭建Claude Code服务

[Read this in English.](./README.md)

## 安装

你需要拥有一台本地的电脑设备，这是你的编程设备，和一台服务器用于运行`GLM-4.5`模型。

### 本地设备

确保您已安装 [Claude Code](https://github.com/anthropics/claude-code)
和 [Claude Code Router](https://github.com/musistudio/claude-code-router)。

```
npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router
```

### 服务器

确保你再服务器上已安装`sglang`。

```shell
pip install sglang
```

并使用如下命令启动模型服务:

```shell
python3 -m sglang.launch_server \
  --model-path zai-org/GLM-4.5 \
  --tp-size 16 \
  --tool-call-parser glm45  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.7 \
  --served-model-name glm-4.5 \
  --port 8000 \
  --host 0.0.0.0 # 或者你服务器的内网/公网IP地址
```

运行成功时， 你将会看到类似如下输出:

```
[2025-07-26 16:09:07] INFO:     Started server process [80269]
[2025-07-26 16:09:07] INFO:     Waiting for application startup.
[2025-07-26 16:09:07] INFO:     Application startup complete.
[2025-07-26 16:09:07] INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
[2025-07-26 16:09:08] INFO:     127.0.0.1:57722 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-07-26 16:09:08 TP0] Prefill batch. #new-seq: 1, #new-token: 6, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0, 
[2025-07-26 16:09:11] INFO:     127.0.0.1:57732 - "POST /generate HTTP/1.1" 200 OK
[2025-07-26 16:09:11] The server is fired up and ready to roll!
```

请确保你的服务器的IP能被你 Claude Code 和 Claude Code Router 安装的设备上访问。

## 配置

1. 将 `config.example.json` 中的 `api_base_url` 修改为服务器的可访问地址。
2. 重命名为 `config.json`
3. 复制到 `~/.claude-code-router/config.json` 中。
4. 再命令行执行`ccr restart` 命令。得到如下输出
  
```
Service was not running or failed to stop.
Starting claude code router service...
✅ Service started successfully in the background.
```

## 运行

执行以下命令:

```
ccr code
```

即可正常运行，效果如下:

```
zr@MacBook GLM-4.5 % ccr code
Service not running, starting service...
╭───────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code!                         │
│                                                   │
│   /help for help, /status for your current setup  │
│                                                   │
│   cwd: /Users/zr/Code/GLM-4.5                     │
│                                                   │
│   ─────────────────────────────────────────────── │
│                                                   │
│   Overrides (via env):                            │
│                                                   │
│   • API timeout: 600000ms                         │
│   • API Base URL: http://127.0.0.1:3456           │
╰───────────────────────────────────────────────────╯

 ※ Tip: Press Esc twice to edit your previous messages

> how can I run GLM-4.5 in transformers

⏺ I'll help you understand how to run GLM-4.5 in transformers. Let me first explore the codebase to see what's available.

⏺ List(.)
  ⎿  Listed 21 paths (ctrl+r to expand)

⏺ Read(README.md)
  ⎿  Read 4 lines (ctrl+r to expand)

⏺ Read(inference/trans_infer_cli.py)
  ⎿  Read 29 lines (ctrl+r to expand)

⏺ Read(requirements.txt)
  ⎿  Read 5 lines (ctrl+r to expand)

⏺ You can run GLM-4.5 in transformers using the example script in inference/trans_infer_cli.py. Here's the basic approach:

  from transformers import AutoModelForCausalLM, AutoTokenizer
  import torch

  MODEL_PATH = "zai-org/GLM-4.5"
  tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
  model = AutoModelForCausalLM.from_pretrained(
      MODEL_PATH,
      torch_dtype=torch.bfloat16,
      device_map="auto"
  )

  Then use the tokenizer's chat template for inference.

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ >                                                                                                                                                                                     │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  ? for shortcuts
```