example.cfg 4.01 KB
Newer Older
Gilbert Lee's avatar
Gilbert Lee committed
1
2
# ConfigFile Format:
# ==================
gilbertlee-amd's avatar
gilbertlee-amd committed
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# A Transfer is defined as a single operation where an Executor reads and adds together
# values from Source (SRC) memory locations, then writes the sum to destination (DST) memory locations.
# This simplifies to a simple copy operation when dealing with single SRC/DST.
#
#                SRC 0                DST 0
#                SRC 1 -> Executor -> DST 1
#                SRC X                DST Y

# Three Executors are supported by TransferBench
#   Executor:        SubExecutor:
#   1) CPU           CPU thread
#   2) GPU           GPU threadblock/Compute Unit (CU)
#   3) DMA           N/A.                                 (May only be used for copies (single SRC/DST)

Gilbert Lee's avatar
Gilbert Lee committed
17
# Each single line in the configuration file defines a set of Transfers (a Test) to run in parallel
Gilbert Lee's avatar
Gilbert Lee committed
18

gilbertlee-amd's avatar
gilbertlee-amd committed
19
# There are two ways to specify a Test:
Gilbert Lee's avatar
Gilbert Lee committed
20
21

# 1) Basic
gilbertlee-amd's avatar
gilbertlee-amd committed
22
#    The basic specification assumes the same number of SubExecutors (SE) used per Transfer
Gilbert Lee's avatar
Gilbert Lee committed
23
#    A positive number of Transfers is specified followed by that number of triplets describing each Transfer
Gilbert Lee's avatar
Gilbert Lee committed
24

gilbertlee-amd's avatar
gilbertlee-amd committed
25
#    #Transfers #SEs (srcMem1->Executor1->dstMem1) ... (srcMemL->ExecutorL->dstMemL)
Gilbert Lee's avatar
Gilbert Lee committed
26
27

# 2) Advanced
gilbertlee-amd's avatar
gilbertlee-amd committed
28
29
#    A negative number of Transfers is specified, followed by quintuplets describing each Transfer
#    A non-zero number of bytes specified will override any provided value
gilbertlee-amd's avatar
gilbertlee-amd committed
30
#    -#Transfers (srcMem1->Executor1->dstMem1 #SEs1 Bytes1) ... (srcMemL->ExecutorL->dstMemL #SEsL BytesL)
Gilbert Lee's avatar
Gilbert Lee committed
31
32

# Argument Details:
Gilbert Lee's avatar
Gilbert Lee committed
33
#   #Transfers:   Number of Transfers to be run in parallel
gilbertlee-amd's avatar
gilbertlee-amd committed
34
35
#   #SEs      :   Number of SubExectors to use (CPU threads/ GPU threadblocks)
#   srcMemL   :   Source memory locations (Where the data is to be read from)
Gilbert Lee's avatar
Gilbert Lee committed
36
37
38
#   Executor  :   Executor is specified by a character indicating type, followed by device index (0-indexed)
#                 - C: CPU-executed  (Indexed from 0 to # NUMA nodes - 1)
#                 - G: GPU-executed  (Indexed from 0 to # GPUs - 1)
gilbertlee-amd's avatar
gilbertlee-amd committed
39
40
#                 - D: DMA-executor  (Indexed from 0 to # GPUs - 1)
#   dstMemL   :   Destination memory locations (Where the data is to be written to)
gilbertlee-amd's avatar
gilbertlee-amd committed
41
42
43
#   bytesL    :   Number of bytes to copy (0 means use command-line specified size)
#                 Must be a multiple of 4 and may be suffixed with ('K','M', or 'G')
#
gilbertlee-amd's avatar
gilbertlee-amd committed
44
45
#                 Memory locations are specified by one or more (device character / device index) pairs
#                 Character indicating memory type followed by device index (0-indexed)
Gilbert Lee's avatar
Gilbert Lee committed
46
47
#                 Supported memory locations are:
#                 - C:    Pinned host memory       (on NUMA node, indexed from 0 to [# NUMA nodes-1])
gilbertlee-amd's avatar
gilbertlee-amd committed
48
#                 - U:    Unpinned host memory     (on NUMA node, indexed from 0 to [# NUMA nodes-1])
Gilbert Lee's avatar
Gilbert Lee committed
49
50
51
#                 - B:    Fine-grain host memory   (on NUMA node, indexed from 0 to [# NUMA nodes-1])
#                 - G:    Global device memory     (on GPU device indexed from 0 to [# GPUs - 1])
#                 - F:    Fine-grain device memory (on GPU device indexed from 0 to [# GPUs - 1])
gilbertlee-amd's avatar
gilbertlee-amd committed
52
#                 - N:    Null memory              (index ignored)
Gilbert Lee's avatar
Gilbert Lee committed
53
54

# Examples:
gilbertlee-amd's avatar
gilbertlee-amd committed
55
56
# 1 4 (G0->G0->G1)                   Uses 4 CUs on GPU0 to copy from GPU0 to GPU1
# 1 4 (C1->G2->G0)                   Uses 4 CUs on GPU2 to copy from CPU1 to GPU0
gilbertlee-amd's avatar
gilbertlee-amd committed
57
58
# 2 4 G0->G0->G1 G1->G1->G0          Copes from GPU0 to GPU1, and GPU1 to GPU0, each with 4 SEs
# -2 (G0 G0 G1 4 1M) (G1 G1 G0 2 2M) Copies 1Mb from GPU0 to GPU1 with 4 SEs, and 2Mb from GPU1 to GPU0 with 2 SEs
Gilbert Lee's avatar
Gilbert Lee committed
59
60
61
62

# Round brackets and arrows' ->' may be included for human clarity, but will be ignored and are unnecessary
# Lines starting with # will be ignored. Lines starting with ## will be echoed to output

gilbertlee-amd's avatar
gilbertlee-amd committed
63
## Single GPU-executed Transfer between GPUs 0 and 1 using 4 CUs
Gilbert Lee's avatar
Gilbert Lee committed
64
1 4 (G0->G0->G1)
gilbertlee-amd's avatar
gilbertlee-amd committed
65

gilbertlee-amd's avatar
gilbertlee-amd committed
66
67
68
69
## Single DMA executed Transfer between GPUs 0 and 1
1 1 (G0->D0->G1)

## Copy 1Mb from GPU0 to GPU1 with 4 CUs, and 2Mb from GPU1 to GPU0 with 8 CUs
gilbertlee-amd's avatar
gilbertlee-amd committed
70
-2 (G0->G0->G1 4 1M) (G1->G1->G0 8 2M)
gilbertlee-amd's avatar
gilbertlee-amd committed
71
72
73
74
75
76
77
78
79

## "Memset" by GPU 0 to GPU 0 memory
1 32 (N0->G0->G0)

## "Read-only" by CPU 0
1 4 (C0->C0->N0)

## Broadcast from GPU 0 to GPU 0 and GPU 1
1 16 (G0->G0->G0G1)