README.md

# Setting up Claude Code Service with SGLang + GLM-4.5 Model

[中文阅读](./README_zh.md)

## Installation

You need to have a local computer device for programming and a server for running the `GLM-4.5` model.

### Local Device

Ensure you have installed [Claude Code](https://github.com/anthropics/claude-code)
and [Claude Code Router](https://github.com/musistudio/claude-code-router).

```
npm install -g @anthropic-ai/claude-code
npm install -g @musistudio/claude-code-router
```

### Server

Ensure you have installed `sglang` on your server.

```shell
pip install sglang
```

And start the model service with the following command:

```shell
python3 -m sglang.launch_server \
  --model-path zai-org/GLM-4.5 \
  --tp-size 16 \
  --tool-call-parser glm45  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.7 \
  --served-model-name glm-4.5 \
  --port 8000 \
  --host 0.0.0.0 # Or your server's internal/public IP address
```

When successful, you will see output similar to the following:

```
[2025-07-26 16:09:07] INFO:     Started server process [80269]
[2025-07-26 16:09:07] INFO:     Waiting for application startup.
[2025-07-26 16:09:07] INFO:     Application startup complete.
[2025-07-26 16:09:07] INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
[2025-07-26 16:09:08] INFO:     127.0.0.1:57722 - "GET /get_model_info HTTP/1.1" 200 OK
[2025-07-26 16:09:11] INFO:     127.0.0.1:57732 - "POST /generate HTTP/1.1" 200 OK
[2025-07-26 16:09:11] The server is fired up and ready to roll!
```

Please ensure your server's IP can be accessed from the device where Claude Code and Claude Code Router are installed.

## Configuration

1. Modify the `api_base_url` in `config.example.json` to your server's accessible address.
2. Rename it to `config.json`
3. Copy it to `~/.claude-code-router/config.json`.
4. Execute the `ccr restart` command in the command line. You should see the following output:
  
    ```
    Service was not running or failed to stop.
    Starting claude code router service...
    ✅ Service started successfully in the background.
    ```

## Running

Execute the following command:

```
ccr code
```

It should looks like:

```
zr@MacBook GLM-4.5 % ccr code
Service not running, starting service...
╭───────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code!                         │
│                                                   │
│   /help for help, /status for your current setup  │
│                                                   │
│   cwd: /Users/zr/Code/GLM-4.5                     │
│                                                   │
│   ─────────────────────────────────────────────── │
│                                                   │
│   Overrides (via env):                            │
│                                                   │
│   • API timeout: 600000ms                         │
│   • API Base URL: http://127.0.0.1:3456           │
╰───────────────────────────────────────────────────╯

 ※ Tip: Press Esc twice to edit your previous messages


> how can I run GLM-4.5 in transformers

⏺ I'll help you understand how to run GLM-4.5 in transformers. Let me first explore the codebase to see what's available.

⏺ List(.)
  ⎿  Listed 21 paths (ctrl+r to expand)

⏺ Read(README.md)
  ⎿  Read 4 lines (ctrl+r to expand)

⏺ Read(inference/trans_infer_cli.py)
  ⎿  Read 29 lines (ctrl+r to expand)

⏺ Read(requirements.txt)
  ⎿  Read 5 lines (ctrl+r to expand)

⏺ You can run GLM-4.5 in transformers using the example script in inference/trans_infer_cli.py. Here's the basic approach:

  from transformers import AutoModelForCausalLM, AutoTokenizer
  import torch

  MODEL_PATH = "zai-org/GLM-4.5"
  tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
  model = AutoModelForCausalLM.from_pretrained(
      MODEL_PATH,
      torch_dtype=torch.bfloat16,
      device_map="auto"
  )

  Then use the tokenizer's chat template for inference.

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ >                                                                                                                                                                                     │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  ? for shortcuts
```