README.md 1009 Bytes
Newer Older
helloyongyang's avatar
helloyongyang committed
1
2
3
4
5
6
7
8
9
10
11
12
# lightx2v_kernel

### Preparation
```
# Install torch, at least version 2.7

pip install scikit_build_core uv
```

### Build whl

```
13
14
15
16
17
18
19
20
git clone https://github.com/NVIDIA/cutlass.git

git clone https://github.com/ModelTC/LightX2V.git

cd LightX2V/lightx2v_kernel

# Set the /path/to/cutlass below to the absolute path of cutlass you download.

helloyongyang's avatar
helloyongyang committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
MAX_JOBS=$(nproc) && CMAKE_BUILD_PARALLEL_LEVEL=$(nproc) \
uv build --wheel \
    -Cbuild-dir=build . \
    -Ccmake.define.CUTLASS_PATH=/path/to/cutlass \
    --verbose \
    --color=always \
    --no-build-isolation
```


### Install whl
```
pip install dist/*whl --force-reinstall --no-deps
```

### Test

##### cos and speed test, mm without bias
```
helloyongyang's avatar
helloyongyang committed
40
python test/nvfp4_nvfp4/test_bench2.py
helloyongyang's avatar
helloyongyang committed
41
42
43
44
```

##### cos and speed test, mm with bias
```
helloyongyang's avatar
helloyongyang committed
45
python test/nvfp4_nvfp4/test_bench3_bias.py
helloyongyang's avatar
helloyongyang committed
46
47
48
49
```

##### Bandwidth utilization test for quant
```
helloyongyang's avatar
helloyongyang committed
50
python test/nvfp4_nvfp4/test_quant_mem_utils.py
helloyongyang's avatar
helloyongyang committed
51
52
53
54
```

##### tflops test for mm
```
helloyongyang's avatar
helloyongyang committed
55
python test/nvfp4_nvfp4/test_mm_tflops.py
helloyongyang's avatar
helloyongyang committed
56
```