example.cfg 2.79 KB
Newer Older
Gilbert Lee's avatar
Gilbert Lee committed
1
2
# ConfigFile Format:
# ==================
Gilbert Lee's avatar
Gilbert Lee committed
3
# A Transfer is defined as a uni-directional transfer from src memory location to dst memory location
Gilbert Lee's avatar
Gilbert Lee committed
4
# executed by either CPU or GPU
Gilbert Lee's avatar
Gilbert Lee committed
5
# Each single line in the configuration file defines a set of Transfers (a Test) to run in parallel
Gilbert Lee's avatar
Gilbert Lee committed
6
7
8
9

# There are two ways to specify the configuration file:

# 1) Basic
Gilbert Lee's avatar
Gilbert Lee committed
10
11
#    The basic specification assumes the same number of threadblocks/CUs used per GPU-executed Transfer
#    A positive number of Transfers is specified followed by that number of triplets describing each Transfer
Gilbert Lee's avatar
Gilbert Lee committed
12

Gilbert Lee's avatar
Gilbert Lee committed
13
#    #Transfers #CUs (srcMem1->Executor1->dstMem1) ... (srcMemL->ExecutorL->dstMemL)
Gilbert Lee's avatar
Gilbert Lee committed
14
15

# 2) Advanced
Gilbert Lee's avatar
Gilbert Lee committed
16
17
18
#    The advanced specification allows different number of threadblocks/CUs used per GPU-executed Transfer
#    A negative number of Transfers is specified, followed by quadruples describing each Transfer
#    -#Transfers (srcMem1->Executor1->dstMem1 #CUs1) ... (srcMemL->ExecutorL->dstMemL #CUsL)
Gilbert Lee's avatar
Gilbert Lee committed
19
20

# Argument Details:
Gilbert Lee's avatar
Gilbert Lee committed
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#   #Transfers:   Number of Transfers to be run in parallel
#   #CUs      :   Number of threadblocks/CUs to use for a GPU-executed Transfer
#   srcMemL   :   Source memory location (Where the data is to be read from). Ignored in memset mode
#   Executor  :   Executor is specified by a character indicating type, followed by device index (0-indexed)
#                 - C: CPU-executed  (Indexed from 0 to # NUMA nodes - 1)
#                 - G: GPU-executed  (Indexed from 0 to # GPUs - 1)
#   dstMemL   :   Destination memory location (Where the data is to be written to)

#                 Memory locations are specified by a character indicating memory type,
#                 followed by device index (0-indexed)
#                 Supported memory locations are:
#                 - C:    Pinned host memory       (on NUMA node, indexed from 0 to [# NUMA nodes-1])
#                 - B:    Fine-grain host memory   (on NUMA node, indexed from 0 to [# NUMA nodes-1])
#                 - G:    Global device memory     (on GPU device indexed from 0 to [# GPUs - 1])
#                 - F:    Fine-grain device memory (on GPU device indexed from 0 to [# GPUs - 1])
Gilbert Lee's avatar
Gilbert Lee committed
36
37

# Examples:
Gilbert Lee's avatar
Gilbert Lee committed
38
39
40
41
# 1 4 (G0->G0->G1)             Single Transfer using 4 CUs on GPU0 to copy from GPU0 to GPU1
# 1 4 (C1->G2->G0)             Single Transfer using 4 CUs on GPU2 to copy from CPU1 to GPU0
# 2 4 G0->G0->G1 G1->G1->G0    Runs 2 Transfers in parallel.  GPU0 to GPU1, and GPU1 to GPU0, each with 4 CUs
# -2 (G0 G0 G1 4) (G1 G1 G0 2) Runs 2 Transfers in parallel.  GPU0 to GPU1 with 4 CUs, and GPU1 to GPU0 with 2 CUs
Gilbert Lee's avatar
Gilbert Lee committed
42
43
44
45

# Round brackets and arrows' ->' may be included for human clarity, but will be ignored and are unnecessary
# Lines starting with # will be ignored. Lines starting with ## will be echoed to output

Gilbert Lee's avatar
Gilbert Lee committed
46
# Single GPU-executed Transfer between GPUs 0 and 1 using 4 CUs
Gilbert Lee's avatar
Gilbert Lee committed
47
1 4 (G0->G0->G1)