README.md 1.45 KB
Newer Older
mashun1's avatar
v1  
mashun1 committed
1
2
3
4
5
6
7
8
# llama.cpp/example/sycl

This example program provides the tools for llama.cpp for SYCL on Intel GPU.

## Tool

|Tool Name| Function|Status|
|-|-|-|
xuxzh1's avatar
init  
xuxzh1 committed
9
|llama-ls-sycl-device| List all SYCL devices with ID, compute capability, max work group size, ect.|Support|
mashun1's avatar
v1  
mashun1 committed
10

xuxzh1's avatar
init  
xuxzh1 committed
11
### llama-ls-sycl-device
mashun1's avatar
v1  
mashun1 committed
12
13
14
15
16
17
18
19
20
21
22
23
24
25

List all SYCL devices with ID, compute capability, max work group size, ect.

1. Build the llama.cpp for SYCL for all targets.

2. Enable oneAPI running environment

```
source /opt/intel/oneapi/setvars.sh
```

3. Execute

```
xuxzh1's avatar
init  
xuxzh1 committed
26
./build/bin/llama-ls-sycl-device
mashun1's avatar
v1  
mashun1 committed
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
```

Check the ID in startup log, like:

```
found 4 SYCL devices:
  Device 0: Intel(R) Arc(TM) A770 Graphics,	compute capability 1.3,
    max compute_units 512,	max work group size 1024,	max sub group size 32,	global mem size 16225243136
  Device 1: Intel(R) FPGA Emulation Device,	compute capability 1.2,
    max compute_units 24,	max work group size 67108864,	max sub group size 64,	global mem size 67065057280
  Device 2: 13th Gen Intel(R) Core(TM) i7-13700K,	compute capability 3.0,
    max compute_units 24,	max work group size 8192,	max sub group size 64,	global mem size 67065057280
  Device 3: Intel(R) Arc(TM) A770 Graphics,	compute capability 3.0,
    max compute_units 512,	max work group size 1024,	max sub group size 32,	global mem size 16225243136

```

|Attribute|Note|
|-|-|
|compute capability 1.3|Level-zero running time, recommended |
|compute capability 3.0|OpenCL running time, slower than level-zero in most cases|