README.md 4.09 KB
Newer Older
cmx's avatar
cmx committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
# Adding a New Vendor Backend

This directory contains vendor-specific operator implementations that automatically replace the default (CUDA) implementations when running on the corresponding device.

## Concepts

- **Vendor**: Chip manufacturer (e.g., `ascend`, `intel`, `nvidia`)
- **Device**: Device type (e.g., `npu`, `xpu`, `cuda`)
- **VendorInfo**: Defines the mapping between vendor and device

## Directory Structure

```
backends/
├── README.md          
├── __init__.py         
├── registry.py         # VendorInfo, register_vendor(), VENDOR_REGISTRY
├── _ascend/            # Ascend (Huawei) vendor - supports NPU
│   ├── __init__.py     # Registers VendorInfo for NPU
│   └── ops/
│       ├── __init__.py # Exports vendor-specific implementations
│       └── geglu.py    # NPU-specific GEGLU implementation
└── _<vendor>/          # Your new vendor backend
    └── ...
```

## How It Works

1. When `liger_kernel.ops.backends` is imported, it imports all vendor packages (e.g., `_ascend`)
2. Each vendor's `__init__.py` calls `register_vendor()` to register itself
3. When `liger_kernel.ops` is imported, `_replace_with_vendor_ops()` is called
4. It detects the current device via `infer_device()` and looks up the vendor
5. Vendor implementations replace/add to the `liger_kernel.ops` namespace

## Adding a New Vendor

### Step 1: Create Directory Structure

```bash
mkdir -p backends/_<vendor>/ops
touch backends/_<vendor>/__init__.py
touch backends/_<vendor>/ops/__init__.py
```

### Step 2: Register Your Vendor

In `backends/_<vendor>/__init__.py`, register your vendor:

```python
"""
<Vendor> backend for Liger-Kernel.
"""

from liger_kernel.ops.backends.registry import VendorInfo, register_vendor

register_vendor(
    VendorInfo(
        vendor="<vendor>",
        device="<device>",
    )
)
```


### Step 3: Ensure Device Detection Works

Make sure `infer_device()` in `liger_kernel/utils.py` can detect your device:

```python
def infer_device():
    if torch.cuda.is_available():
        return "cuda"
    if is_npu_available():
        return "npu"
    # Add your device detection here
    if is_<device>_available():
        return "<device>"
    return "cpu"
```

### Step 4: Implement Vendor-Specific Operators

Create operator files in `backends/_<vendor>/ops/`. For example, `geglu.py`:

```python
import torch

class LigerGELUMulFunction(torch.autograd.Function):
    """
    Vendor-specific LigerGELUMulFunction implementation.
    """
    @staticmethod
    def forward(ctx, a, b):
        # Your vendor-specific forward implementation
        ...

    @staticmethod
    def backward(ctx, dc):
        # Your vendor-specific backward implementation
        ...

# Optional: vendor-specific kernel functions
def geglu_forward_vendor(a, b):
    ...

def geglu_backward_vendor(a, b, dc):
    ...
```

### Step 5: Export in `ops/__init__.py`

In `backends/_<vendor>/ops/__init__.py`, export your implementations:

```python
"""
<Vendor>-specific operator implementations.
"""

from .<module> import (
    LigerGELUMulFunction,
    geglu_forward_vendor as geglu_forward,   # Rename to match default API
    geglu_backward_vendor as geglu_backward,
)

# Explicitly declare what to export (recommended)
__all__ = [
    "LigerGELUMulFunction",
    "geglu_forward",
    "geglu_backward",
]
```

## Key Points

### Incremental Override

You **don't need to implement all operators**. Only implement the ones that require vendor-specific adaptations. Unimplemented operators will automatically fall back to the default (CUDA) implementation.

### Vendor-Specific Additions

Vendors can also **add new operators** that don't exist in the default implementation. These will be exported to `liger_kernel.ops` namespace for users to import.

### Naming Convention

- Use the **same class/function names** as the default implementations for overrides
- This allows seamless replacement without changing user code
- Use `as` imports to rename if your internal naming differs

## Example: Ascend NPU Backend

See `_ascend/` directory for a complete example of the Ascend NPU backend implementation.