Skip to content
GitLab
Menu
Projects
Groups
Snippets
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
tianlh
LightGBM-DCU
Commits
631e0a2a
Unverified
Commit
631e0a2a
authored
Mar 18, 2024
by
James Lamb
Committed by
GitHub
Mar 18, 2024
Browse files
[ci] prevent trailing whitespace, ensure files end with newline (#6373)
parent
6a1ec444
Changes
42
Hide whitespace changes
Inline
Side-by-side
Showing
20 changed files
with
137 additions
and
138 deletions
+137
-138
examples/lambdarank/README.md
examples/lambdarank/README.md
+1
-1
examples/lambdarank/train.conf
examples/lambdarank/train.conf
+6
-6
examples/multiclass_classification/train.conf
examples/multiclass_classification/train.conf
+3
-3
examples/parallel_learning/train.conf
examples/parallel_learning/train.conf
+5
-5
examples/python-guide/README.md
examples/python-guide/README.md
+2
-2
examples/regression/train.conf
examples/regression/train.conf
+6
-6
examples/xendcg/README.md
examples/xendcg/README.md
+1
-1
examples/xendcg/train.conf
examples/xendcg/train.conf
+5
-5
include/LightGBM/bin.h
include/LightGBM/bin.h
+1
-1
include/LightGBM/network.h
include/LightGBM/network.h
+3
-3
include/LightGBM/utils/common.h
include/LightGBM/utils/common.h
+1
-1
pmml/README.md
pmml/README.md
+1
-1
src/c_api.cpp
src/c_api.cpp
+1
-1
src/treelearner/kernels/histogram_16_64_256.hu
src/treelearner/kernels/histogram_16_64_256.hu
+10
-11
src/treelearner/leaf_splits.hpp
src/treelearner/leaf_splits.hpp
+2
-2
src/treelearner/ocl/histogram16.cl
src/treelearner/ocl/histogram16.cl
+41
-41
src/treelearner/ocl/histogram256.cl
src/treelearner/ocl/histogram256.cl
+19
-19
src/treelearner/ocl/histogram64.cl
src/treelearner/ocl/histogram64.cl
+19
-19
swig/StringArray.i
swig/StringArray.i
+7
-7
swig/pointer_manipulation.i
swig/pointer_manipulation.i
+3
-3
No files found.
examples/lambdarank/README.md
View file @
631e0a2a
...
@@ -29,5 +29,5 @@ Run the following command in this folder:
...
@@ -29,5 +29,5 @@ Run the following command in this folder:
Data Format
Data Format
-----------
-----------
To learn more about the query format used in this example, check out the
To learn more about the query format used in this example, check out the
[
query data format
](
https://lightgbm.readthedocs.io/en/latest/Parameters.html#query-data
)
.
[
query data format
](
https://lightgbm.readthedocs.io/en/latest/Parameters.html#query-data
)
.
examples/lambdarank/train.conf
View file @
631e0a2a
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
objective
=
lambdarank
objective
=
lambdarank
# eval metrics, support multi metric, delimited by ',' , support following metrics
# eval metrics, support multi metric, delimited by ',' , support following metrics
# l1
# l1
# l2 , default metric for regression
# l2 , default metric for regression
# ndcg , default metric for lambdarank
# ndcg , default metric for lambdarank
# auc
# auc
# binary_logloss , default metric for binary
# binary_logloss , default metric for binary
# binary_error
# binary_error
metric
=
ndcg
metric
=
ndcg
...
@@ -32,7 +32,7 @@ is_training_metric = true
...
@@ -32,7 +32,7 @@ is_training_metric = true
# column in data to use as label
# column in data to use as label
label_column
=
0
label_column
=
0
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
max_bin
=
255
max_bin
=
255
# training data
# training data
...
@@ -44,7 +44,7 @@ data = rank.train
...
@@ -44,7 +44,7 @@ data = rank.train
# validation data, support multi validation data, separated by ','
# validation data, support multi validation data, separated by ','
# if existing weight file, should name to "rank.test.weight"
# if existing weight file, should name to "rank.test.weight"
# if existing query file, should name to "rank.test.query"
# if existing query file, should name to "rank.test.query"
# alias: valid, test, test_data,
# alias: valid, test, test_data,
valid_data
=
rank
.
test
valid_data
=
rank
.
test
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
...
@@ -64,10 +64,10 @@ num_leaves = 31
...
@@ -64,10 +64,10 @@ num_leaves = 31
# alias: tree
# alias: tree
tree_learner
=
serial
tree_learner
=
serial
# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu.
# number of threads for multi-threading. One thread will use one CPU, defalut is setted to #cpu.
# num_threads = 8
# num_threads = 8
# feature sub-sample, will random select 80% feature to train on each iteration
# feature sub-sample, will random select 80% feature to train on each iteration
# alias: sub_feature
# alias: sub_feature
feature_fraction
=
1
.
0
feature_fraction
=
1
.
0
...
...
examples/multiclass_classification/train.conf
View file @
631e0a2a
...
@@ -13,10 +13,10 @@ boosting_type = gbdt
...
@@ -13,10 +13,10 @@ boosting_type = gbdt
objective
=
multiclass
objective
=
multiclass
# eval metrics, support multi metric, delimited by ',' , support following metrics
# eval metrics, support multi metric, delimited by ',' , support following metrics
# l1
# l1
# l2 , default metric for regression
# l2 , default metric for regression
# ndcg , default metric for lambdarank
# ndcg , default metric for lambdarank
# auc
# auc
# binary_logloss , default metric for binary
# binary_logloss , default metric for binary
# binary_error
# binary_error
# multi_logloss
# multi_logloss
...
@@ -44,7 +44,7 @@ is_training_metric = true
...
@@ -44,7 +44,7 @@ is_training_metric = true
# column in data to use as label
# column in data to use as label
label_column
=
0
label_column
=
0
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
max_bin
=
255
max_bin
=
255
# training data
# training data
...
...
examples/parallel_learning/train.conf
View file @
631e0a2a
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
objective
=
binary
objective
=
binary
# eval metrics, support multi metric, delimite by ',' , support following metrics
# eval metrics, support multi metric, delimite by ',' , support following metrics
# l1
# l1
# l2 , default metric for regression
# l2 , default metric for regression
# ndcg , default metric for lambdarank
# ndcg , default metric for lambdarank
# auc
# auc
# binary_logloss , default metric for binary
# binary_logloss , default metric for binary
# binary_error
# binary_error
metric
=
binary_logloss
,
auc
metric
=
binary_logloss
,
auc
...
@@ -29,7 +29,7 @@ is_training_metric = true
...
@@ -29,7 +29,7 @@ is_training_metric = true
# column in data to use as label
# column in data to use as label
label_column
=
0
label_column
=
0
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
max_bin
=
255
max_bin
=
255
# training data
# training data
...
@@ -39,7 +39,7 @@ data = binary.train
...
@@ -39,7 +39,7 @@ data = binary.train
# validation data, support multi validation data, separated by ','
# validation data, support multi validation data, separated by ','
# if existing weight file, should name to "binary.test.weight"
# if existing weight file, should name to "binary.test.weight"
# alias: valid, test, test_data,
# alias: valid, test, test_data,
valid_data
=
binary
.
test
valid_data
=
binary
.
test
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
...
@@ -62,7 +62,7 @@ tree_learner = feature
...
@@ -62,7 +62,7 @@ tree_learner = feature
# number of threads for multi-threading. One thread will use each CPU. The default is the CPU count.
# number of threads for multi-threading. One thread will use each CPU. The default is the CPU count.
# num_threads = 8
# num_threads = 8
# feature sub-sample, will random select 80% feature to train on each iteration
# feature sub-sample, will random select 80% feature to train on each iteration
# alias: sub_feature
# alias: sub_feature
feature_fraction
=
0
.
8
feature_fraction
=
0
.
8
...
...
examples/python-guide/README.md
View file @
631e0a2a
...
@@ -23,11 +23,11 @@ Examples include:
...
@@ -23,11 +23,11 @@ Examples include:
-
[
simple_example.py
](
https://github.com/microsoft/LightGBM/blob/master/examples/python-guide/simple_example.py
)
-
[
simple_example.py
](
https://github.com/microsoft/LightGBM/blob/master/examples/python-guide/simple_example.py
)
-
Construct Dataset
-
Construct Dataset
-
Basic train and predict
-
Basic train and predict
-
Eval during training
-
Eval during training
-
Early stopping
-
Early stopping
-
Save model to file
-
Save model to file
-
[
sklearn_example.py
](
https://github.com/microsoft/LightGBM/blob/master/examples/python-guide/sklearn_example.py
)
-
[
sklearn_example.py
](
https://github.com/microsoft/LightGBM/blob/master/examples/python-guide/sklearn_example.py
)
-
Create data for learning with sklearn interface
-
Create data for learning with sklearn interface
-
Basic train and predict with sklearn interface
-
Basic train and predict with sklearn interface
-
Feature importances with sklearn interface
-
Feature importances with sklearn interface
-
Self-defined eval metric with sklearn interface
-
Self-defined eval metric with sklearn interface
...
...
examples/regression/train.conf
View file @
631e0a2a
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
objective
=
regression
objective
=
regression
# eval metrics, support multi metric, delimite by ',' , support following metrics
# eval metrics, support multi metric, delimite by ',' , support following metrics
# l1
# l1
# l2 , default metric for regression
# l2 , default metric for regression
# ndcg , default metric for lambdarank
# ndcg , default metric for lambdarank
# auc
# auc
# binary_logloss , default metric for binary
# binary_logloss , default metric for binary
# binary_error
# binary_error
metric
=
l2
metric
=
l2
...
@@ -29,7 +29,7 @@ is_training_metric = true
...
@@ -29,7 +29,7 @@ is_training_metric = true
# column in data to use as label
# column in data to use as label
label_column
=
0
label_column
=
0
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
max_bin
=
255
max_bin
=
255
# forced bin thresholds
# forced bin thresholds
...
@@ -42,7 +42,7 @@ data = regression.train
...
@@ -42,7 +42,7 @@ data = regression.train
# validation data, support multi validation data, separated by ','
# validation data, support multi validation data, separated by ','
# if exsting weight file, should name to "regression.test.weight"
# if exsting weight file, should name to "regression.test.weight"
# alias: valid, test, test_data,
# alias: valid, test, test_data,
valid_data
=
regression
.
test
valid_data
=
regression
.
test
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
...
@@ -62,10 +62,10 @@ num_leaves = 31
...
@@ -62,10 +62,10 @@ num_leaves = 31
# alias: tree
# alias: tree
tree_learner
=
serial
tree_learner
=
serial
# number of threads for multi-threading. One thread will use one CPU, default is setted to #cpu.
# number of threads for multi-threading. One thread will use one CPU, default is setted to #cpu.
# num_threads = 8
# num_threads = 8
# feature sub-sample, will random select 80% feature to train on each iteration
# feature sub-sample, will random select 80% feature to train on each iteration
# alias: sub_feature
# alias: sub_feature
feature_fraction
=
0
.
9
feature_fraction
=
0
.
9
...
...
examples/xendcg/README.md
View file @
631e0a2a
...
@@ -29,5 +29,5 @@ Run the following command in this folder:
...
@@ -29,5 +29,5 @@ Run the following command in this folder:
Data Format
Data Format
-----------
-----------
To learn more about the query format used in this example, check out the
To learn more about the query format used in this example, check out the
[
query data format
](
https://lightgbm.readthedocs.io/en/latest/Parameters.html#query-data
)
.
[
query data format
](
https://lightgbm.readthedocs.io/en/latest/Parameters.html#query-data
)
.
examples/xendcg/train.conf
View file @
631e0a2a
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
...
@@ -12,10 +12,10 @@ boosting_type = gbdt
objective
=
rank_xendcg
objective
=
rank_xendcg
# eval metrics, support multi metric, delimite by ',' , support following metrics
# eval metrics, support multi metric, delimite by ',' , support following metrics
# l1
# l1
# l2 , default metric for regression
# l2 , default metric for regression
# ndcg , default metric for lambdarank
# ndcg , default metric for lambdarank
# auc
# auc
# binary_logloss , default metric for binary
# binary_logloss , default metric for binary
# binary_error
# binary_error
metric
=
ndcg
metric
=
ndcg
...
@@ -32,7 +32,7 @@ is_training_metric = true
...
@@ -32,7 +32,7 @@ is_training_metric = true
# column in data to use as label
# column in data to use as label
label_column
=
0
label_column
=
0
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
# number of bins for feature bucket, 255 is a recommend setting, it can save memories, and also has good accuracy.
max_bin
=
255
max_bin
=
255
# training data
# training data
...
@@ -44,7 +44,7 @@ data = rank.train
...
@@ -44,7 +44,7 @@ data = rank.train
# validation data, support multi validation data, separated by ','
# validation data, support multi validation data, separated by ','
# if existing weight file, should name to "rank.test.weight"
# if existing weight file, should name to "rank.test.weight"
# if existing query file, should name to "rank.test.query"
# if existing query file, should name to "rank.test.query"
# alias: valid, test, test_data,
# alias: valid, test, test_data,
valid_data
=
rank
.
test
valid_data
=
rank
.
test
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
# number of trees(iterations), alias: num_tree, num_iteration, num_iterations, num_round, num_rounds
...
@@ -68,7 +68,7 @@ tree_learner = serial
...
@@ -68,7 +68,7 @@ tree_learner = serial
num_threads
=
1
num_threads
=
1
objective_seed
=
1025
objective_seed
=
1025
# feature sub-sample, will random select 80% feature to train on each iteration
# feature sub-sample, will random select 80% feature to train on each iteration
# alias: sub_feature
# alias: sub_feature
feature_fraction
=
1
.
0
feature_fraction
=
1
.
0
...
...
include/LightGBM/bin.h
View file @
631e0a2a
...
@@ -144,7 +144,7 @@ class BinMapper {
...
@@ -144,7 +144,7 @@ class BinMapper {
/*!
/*!
* \brief Maximum categorical value
* \brief Maximum categorical value
* \return Maximum categorical value for categorical features, 0 for numerical features
* \return Maximum categorical value for categorical features, 0 for numerical features
*/
*/
inline
int
MaxCatValue
()
const
{
inline
int
MaxCatValue
()
const
{
if
(
bin_2_categorical_
.
size
()
==
0
)
{
if
(
bin_2_categorical_
.
size
()
==
0
)
{
...
...
include/LightGBM/network.h
View file @
631e0a2a
...
@@ -128,7 +128,7 @@ class Network {
...
@@ -128,7 +128,7 @@ class Network {
const
ReduceFunction
&
reducer
);
const
ReduceFunction
&
reducer
);
/*!
/*!
* \brief Performing all_gather by using Bruck algorithm.
* \brief Performing all_gather by using Bruck algorithm.
Communication times is O(log(n)), and communication cost is O(send_size * number_machine)
Communication times is O(log(n)), and communication cost is O(send_size * number_machine)
* It can be used when all nodes have same input size.
* It can be used when all nodes have same input size.
* \param input Input data
* \param input Input data
...
@@ -138,7 +138,7 @@ class Network {
...
@@ -138,7 +138,7 @@ class Network {
static
void
Allgather
(
char
*
input
,
comm_size_t
send_size
,
char
*
output
);
static
void
Allgather
(
char
*
input
,
comm_size_t
send_size
,
char
*
output
);
/*!
/*!
* \brief Performing all_gather by using Bruck algorithm.
* \brief Performing all_gather by using Bruck algorithm.
Communication times is O(log(n)), and communication cost is O(all_size)
Communication times is O(log(n)), and communication cost is O(all_size)
* It can be used when nodes have different input size.
* It can be used when nodes have different input size.
* \param input Input data
* \param input Input data
...
@@ -150,7 +150,7 @@ class Network {
...
@@ -150,7 +150,7 @@ class Network {
static
void
Allgather
(
char
*
input
,
const
comm_size_t
*
block_start
,
const
comm_size_t
*
block_len
,
char
*
output
,
comm_size_t
all_size
);
static
void
Allgather
(
char
*
input
,
const
comm_size_t
*
block_start
,
const
comm_size_t
*
block_len
,
char
*
output
,
comm_size_t
all_size
);
/*!
/*!
* \brief Perform reduce scatter by using recursive halving algorithm.
* \brief Perform reduce scatter by using recursive halving algorithm.
Communication times is O(log(n)), and communication cost is O(input_size)
Communication times is O(log(n)), and communication cost is O(input_size)
* \param input Input data
* \param input Input data
* \param input_size The size of input data
* \param input_size The size of input data
...
...
include/LightGBM/utils/common.h
View file @
631e0a2a
...
@@ -1232,7 +1232,7 @@ struct __TToStringHelper<T, true, true> {
...
@@ -1232,7 +1232,7 @@ struct __TToStringHelper<T, true, true> {
* Converts an array to a string with with values separated by the space character.
* Converts an array to a string with with values separated by the space character.
* This method replaces Common's ``ArrayToString`` and ``ArrayToStringFast`` functionality
* This method replaces Common's ``ArrayToString`` and ``ArrayToStringFast`` functionality
* and is locale-independent.
* and is locale-independent.
*
*
* \note If ``high_precision_output`` is set to true,
* \note If ``high_precision_output`` is set to true,
* floating point values are output with more digits of precision.
* floating point values are output with more digits of precision.
*/
*/
...
...
pmml/README.md
View file @
631e0a2a
PMML Generator
PMML Generator
==============
==============
The old Python convert script is removed due to it cannot support the new format of categorical features.
The old Python convert script is removed due to it cannot support the new format of categorical features.
...
...
src/c_api.cpp
View file @
631e0a2a
...
@@ -107,7 +107,7 @@ class SingleRowPredictorInner {
...
@@ -107,7 +107,7 @@ class SingleRowPredictorInner {
/*!
/*!
* \brief Object to store resources meant for single-row Fast Predict methods.
* \brief Object to store resources meant for single-row Fast Predict methods.
*
*
* For legacy reasons this is called `FastConfig` in the public C API.
* For legacy reasons this is called `FastConfig` in the public C API.
*
*
* Meant to be used by the *Fast* predict methods only.
* Meant to be used by the *Fast* predict methods only.
...
...
src/treelearner/kernels/histogram_16_64_256.hu
View file @
631e0a2a
...
@@ -25,36 +25,36 @@ typedef unsigned char uchar;
...
@@ -25,36 +25,36 @@ typedef unsigned char uchar;
template<typename T>
template<typename T>
__device__ double as_double(const T t) {
__device__ double as_double(const T t) {
static_assert(sizeof(T) == sizeof(double), "size mismatch");
static_assert(sizeof(T) == sizeof(double), "size mismatch");
double d;
double d;
memcpy(&d, &t, sizeof(T));
memcpy(&d, &t, sizeof(T));
return d;
return d;
}
}
template<typename T>
template<typename T>
__device__ unsigned long long as_ulong_ulong(const T t) {
__device__ unsigned long long as_ulong_ulong(const T t) {
static_assert(sizeof(T) == sizeof(unsigned long long), "size mismatch");
static_assert(sizeof(T) == sizeof(unsigned long long), "size mismatch");
unsigned long long u;
unsigned long long u;
memcpy(&u, &t, sizeof(T));
memcpy(&u, &t, sizeof(T));
return u;
return u;
}
}
template<typename T>
template<typename T>
__device__ float as_float(const T t) {
__device__ float as_float(const T t) {
static_assert(sizeof(T) == sizeof(float), "size mismatch");
static_assert(sizeof(T) == sizeof(float), "size mismatch");
float f;
float f;
memcpy(&f, &t, sizeof(T));
memcpy(&f, &t, sizeof(T));
return f;
return f;
}
}
template<typename T>
template<typename T>
__device__ unsigned int as_uint(const T t) {
__device__ unsigned int as_uint(const T t) {
static_assert(sizeof(T) == sizeof(unsigned int), "size_mismatch");
static_assert(sizeof(T) == sizeof(unsigned int), "size_mismatch");
unsigned int u;
unsigned int u;
memcpy(&u, &t, sizeof(T));
memcpy(&u, &t, sizeof(T));
return u;
return u;
}
}
template<typename T>
template<typename T>
__device__ uchar4 as_uchar4(const T t) {
__device__ uchar4 as_uchar4(const T t) {
static_assert(sizeof(T) == sizeof(uchar4), "size mismatch");
static_assert(sizeof(T) == sizeof(uchar4), "size mismatch");
uchar4 u;
uchar4 u;
memcpy(&u, &t, sizeof(T));
memcpy(&u, &t, sizeof(T));
return u;
return u;
}
}
...
@@ -158,4 +158,3 @@ DECLARE(histogram256);
...
@@ -158,4 +158,3 @@ DECLARE(histogram256);
} // namespace LightGBM
} // namespace LightGBM
#endif // LIGHTGBM_TREELEARNER_KERNELS_HISTOGRAM_16_64_256_HU_
#endif // LIGHTGBM_TREELEARNER_KERNELS_HISTOGRAM_16_64_256_HU_
src/treelearner/leaf_splits.hpp
View file @
631e0a2a
...
@@ -38,7 +38,7 @@ class LeafSplits {
...
@@ -38,7 +38,7 @@ class LeafSplits {
}
}
/*!
/*!
* \brief Init split on current leaf on partial data.
* \brief Init split on current leaf on partial data.
* \param leaf Index of current leaf
* \param leaf Index of current leaf
* \param data_partition current data partition
* \param data_partition current data partition
* \param sum_gradients
* \param sum_gradients
...
@@ -54,7 +54,7 @@ class LeafSplits {
...
@@ -54,7 +54,7 @@ class LeafSplits {
}
}
/*!
/*!
* \brief Init split on current leaf on partial data.
* \brief Init split on current leaf on partial data.
* \param leaf Index of current leaf
* \param leaf Index of current leaf
* \param data_partition current data partition
* \param data_partition current data partition
* \param sum_gradients
* \param sum_gradients
...
...
src/treelearner/ocl/histogram16.cl
View file @
631e0a2a
...
@@ -73,12 +73,12 @@ typedef uint acc_int_type;
...
@@ -73,12 +73,12 @@ typedef uint acc_int_type;
//
local
memory
size
in
bytes
//
local
memory
size
in
bytes
#
define
LOCAL_MEM_SIZE
(
DWORD_FEATURES
*
(
sizeof
(
uint
)
+
2
*
sizeof
(
acc_type
))
*
NUM_BINS
*
NUM_BANKS
)
#
define
LOCAL_MEM_SIZE
(
DWORD_FEATURES
*
(
sizeof
(
uint
)
+
2
*
sizeof
(
acc_type
))
*
NUM_BINS
*
NUM_BANKS
)
//
unroll
the
atomic
operation
for
a
few
times.
Takes
more
code
space,
//
unroll
the
atomic
operation
for
a
few
times.
Takes
more
code
space,
//
but
compiler
can
generate
better
code
for
faster
atomics.
//
but
compiler
can
generate
better
code
for
faster
atomics.
#
define
UNROLL_ATOMIC
1
#
define
UNROLL_ATOMIC
1
//
Options
passed
by
compiler
at
run
time:
//
Options
passed
by
compiler
at
run
time:
//
IGNORE_INDICES
will
be
set
when
the
kernel
does
not
//
IGNORE_INDICES
will
be
set
when
the
kernel
does
not
//
#
define
IGNORE_INDICES
//
#
define
IGNORE_INDICES
//
#
define
POWER_FEATURE_WORKGROUPS
10
//
#
define
POWER_FEATURE_WORKGROUPS
10
...
@@ -161,7 +161,7 @@ R""()
...
@@ -161,7 +161,7 @@ R""()
//
this
function
will
be
called
by
histogram16
//
this
function
will
be
called
by
histogram16
//
we
have
one
sub-histogram
of
one
feature
in
registers,
and
need
to
read
others
//
we
have
one
sub-histogram
of
one
feature
in
registers,
and
need
to
read
others
void
within_kernel_reduction16x8
(
uchar8
feature_mask,
void
within_kernel_reduction16x8
(
uchar8
feature_mask,
__global
const
acc_type*
restrict
feature4_sub_hist,
__global
const
acc_type*
restrict
feature4_sub_hist,
const
uint
skip_id,
const
uint
skip_id,
acc_type
stat_val,
acc_type
stat_val,
const
ushort
num_sub_hist,
const
ushort
num_sub_hist,
...
@@ -173,7 +173,7 @@ void within_kernel_reduction16x8(uchar8 feature_mask,
...
@@ -173,7 +173,7 @@ void within_kernel_reduction16x8(uchar8 feature_mask,
uchar
is_hessian_first
=
(
ltid
>>
LOG2_DWORD_FEATURES
)
&
1
; // hessian or gradient
uchar
is_hessian_first
=
(
ltid
>>
LOG2_DWORD_FEATURES
)
&
1
; // hessian or gradient
ushort
bin_id
=
ltid
>>
(
LOG2_DWORD_FEATURES
+
1
)
; // range 0 - 16
ushort
bin_id
=
ltid
>>
(
LOG2_DWORD_FEATURES
+
1
)
; // range 0 - 16
ushort
i
;
ushort
i
;
#
if
POWER_FEATURE_WORKGROUPS
!=
0
#
if
POWER_FEATURE_WORKGROUPS
!=
0
//
if
there
is
only
1
work
group,
no
need
to
do
the
reduction
//
if
there
is
only
1
work
group,
no
need
to
do
the
reduction
//
add
all
sub-histograms
for
4
features
//
add
all
sub-histograms
for
4
features
__global
const
acc_type*
restrict
p
=
feature4_sub_hist
+
ltid
;
__global
const
acc_type*
restrict
p
=
feature4_sub_hist
+
ltid
;
...
@@ -185,7 +185,7 @@ void within_kernel_reduction16x8(uchar8 feature_mask,
...
@@ -185,7 +185,7 @@ void within_kernel_reduction16x8(uchar8 feature_mask,
//
skip
the
counters
we
already
have
//
skip
the
counters
we
already
have
p
+=
2
*
DWORD_FEATURES
*
NUM_BINS
;
p
+=
2
*
DWORD_FEATURES
*
NUM_BINS
;
for
(
i
=
i
+
1
; i < num_sub_hist; ++i) {
for
(
i
=
i
+
1
; i < num_sub_hist; ++i) {
stat_val
+=
*p
;
stat_val
+=
*p
;
p
+=
NUM_BINS
*
DWORD_FEATURES
*
2
;
p
+=
NUM_BINS
*
DWORD_FEATURES
*
2
;
}
}
#
endif
#
endif
...
@@ -208,12 +208,12 @@ R""()
...
@@ -208,12 +208,12 @@ R""()
__attribute__
((
reqd_work_group_size
(
LOCAL_SIZE_0,
1
,
1
)))
__attribute__
((
reqd_work_group_size
(
LOCAL_SIZE_0,
1
,
1
)))
#
if
USE_CONSTANT_BUF
==
1
#
if
USE_CONSTANT_BUF
==
1
__kernel
void
histogram16
(
__global
const
uchar4*
restrict
feature_data_base,
__kernel
void
histogram16
(
__global
const
uchar4*
restrict
feature_data_base,
__constant
const
uchar8*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
uchar8*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
feature_size,
const
data_size_t
feature_size,
__constant
const
data_size_t*
restrict
data_indices
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
data_size_t*
restrict
data_indices
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
num_data,
const
data_size_t
num_data,
__constant
const
score_t*
restrict
ordered_gradients
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
score_t*
restrict
ordered_gradients
__attribute__
((
max_constant_size
(
65536
)))
,
#
if
CONST_HESSIAN
==
0
#
if
CONST_HESSIAN
==
0
__constant
const
score_t*
restrict
ordered_hessians
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
score_t*
restrict
ordered_hessians
__attribute__
((
max_constant_size
(
65536
)))
,
#
else
#
else
...
@@ -223,18 +223,18 @@ __kernel void histogram16(__global const uchar4* restrict feature_data_base,
...
@@ -223,18 +223,18 @@ __kernel void histogram16(__global const uchar4* restrict feature_data_base,
__global
volatile
int
*
sync_counters,
__global
volatile
int
*
sync_counters,
__global
acc_type*
restrict
hist_buf_base
)
{
__global
acc_type*
restrict
hist_buf_base
)
{
#
else
#
else
__kernel
void
histogram16
(
__global
const
uchar4*
feature_data_base,
__kernel
void
histogram16
(
__global
const
uchar4*
feature_data_base,
__constant
const
uchar8*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
uchar8*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
feature_size,
const
data_size_t
feature_size,
__global
const
data_size_t*
data_indices,
__global
const
data_size_t*
data_indices,
const
data_size_t
num_data,
const
data_size_t
num_data,
__global
const
score_t*
ordered_gradients,
__global
const
score_t*
ordered_gradients,
#
if
CONST_HESSIAN
==
0
#
if
CONST_HESSIAN
==
0
__global
const
score_t*
ordered_hessians,
__global
const
score_t*
ordered_hessians,
#
else
#
else
const
score_t
const_hessian,
const
score_t
const_hessian,
#
endif
#
endif
__global
char*
restrict
output_buf,
__global
char*
restrict
output_buf,
__global
volatile
int
*
sync_counters,
__global
volatile
int
*
sync_counters,
__global
acc_type*
restrict
hist_buf_base
)
{
__global
acc_type*
restrict
hist_buf_base
)
{
#
endif
#
endif
...
@@ -260,38 +260,38 @@ __kernel void histogram16(__global const uchar4* feature_data_base,
...
@@ -260,38 +260,38 @@ __kernel void histogram16(__global const uchar4* feature_data_base,
//
there
are
8
banks
(
sub-histograms
)
used
by
256
threads
total
8
KB
//
there
are
8
banks
(
sub-histograms
)
used
by
256
threads
total
8
KB
/*
memory
layout
of
gh_hist:
/*
memory
layout
of
gh_hist:
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
bk0_g_f0_bin0
bk0_g_f1_bin0
bk0_g_f2_bin0
bk0_g_f3_bin0
bk0_g_f4_bin0
bk0_g_f5_bin0
bk0_g_f6_bin0
bk0_g_f7_bin0
bk0_g_f0_bin0
bk0_g_f1_bin0
bk0_g_f2_bin0
bk0_g_f3_bin0
bk0_g_f4_bin0
bk0_g_f5_bin0
bk0_g_f6_bin0
bk0_g_f7_bin0
bk0_h_f0_bin0
bk0_h_f1_bin0
bk0_h_f2_bin0
bk0_h_f3_bin0
bk0_h_f4_bin0
bk0_h_f5_bin0
bk0_h_f6_bin0
bk0_h_f7_bin0
bk0_h_f0_bin0
bk0_h_f1_bin0
bk0_h_f2_bin0
bk0_h_f3_bin0
bk0_h_f4_bin0
bk0_h_f5_bin0
bk0_h_f6_bin0
bk0_h_f7_bin0
bk1_g_f0_bin0
bk1_g_f1_bin0
bk1_g_f2_bin0
bk1_g_f3_bin0
bk1_g_f4_bin0
bk1_g_f5_bin0
bk1_g_f6_bin0
bk1_g_f7_bin0
bk1_g_f0_bin0
bk1_g_f1_bin0
bk1_g_f2_bin0
bk1_g_f3_bin0
bk1_g_f4_bin0
bk1_g_f5_bin0
bk1_g_f6_bin0
bk1_g_f7_bin0
bk1_h_f0_bin0
bk1_h_f1_bin0
bk1_h_f2_bin0
bk1_h_f3_bin0
bk1_h_f4_bin0
bk1_h_f5_bin0
bk1_h_f6_bin0
bk1_h_f7_bin0
bk1_h_f0_bin0
bk1_h_f1_bin0
bk1_h_f2_bin0
bk1_h_f3_bin0
bk1_h_f4_bin0
bk1_h_f5_bin0
bk1_h_f6_bin0
bk1_h_f7_bin0
bk2_g_f0_bin0
bk2_g_f1_bin0
bk2_g_f2_bin0
bk2_g_f3_bin0
bk2_g_f4_bin0
bk2_g_f5_bin0
bk2_g_f6_bin0
bk2_g_f7_bin0
bk2_g_f0_bin0
bk2_g_f1_bin0
bk2_g_f2_bin0
bk2_g_f3_bin0
bk2_g_f4_bin0
bk2_g_f5_bin0
bk2_g_f6_bin0
bk2_g_f7_bin0
bk2_h_f0_bin0
bk2_h_f1_bin0
bk2_h_f2_bin0
bk2_h_f3_bin0
bk2_h_f4_bin0
bk2_h_f5_bin0
bk2_h_f6_bin0
bk2_h_f7_bin0
bk2_h_f0_bin0
bk2_h_f1_bin0
bk2_h_f2_bin0
bk2_h_f3_bin0
bk2_h_f4_bin0
bk2_h_f5_bin0
bk2_h_f6_bin0
bk2_h_f7_bin0
bk3_g_f0_bin0
bk3_g_f1_bin0
bk3_g_f2_bin0
bk3_g_f3_bin0
bk3_g_f4_bin0
bk3_g_f5_bin0
bk3_g_f6_bin0
bk3_g_f7_bin0
bk3_g_f0_bin0
bk3_g_f1_bin0
bk3_g_f2_bin0
bk3_g_f3_bin0
bk3_g_f4_bin0
bk3_g_f5_bin0
bk3_g_f6_bin0
bk3_g_f7_bin0
bk3_h_f0_bin0
bk3_h_f1_bin0
bk3_h_f2_bin0
bk3_h_f3_bin0
bk3_h_f4_bin0
bk3_h_f5_bin0
bk3_h_f6_bin0
bk3_h_f7_bin0
bk3_h_f0_bin0
bk3_h_f1_bin0
bk3_h_f2_bin0
bk3_h_f3_bin0
bk3_h_f4_bin0
bk3_h_f5_bin0
bk3_h_f6_bin0
bk3_h_f7_bin0
bk4_g_f0_bin0
bk4_g_f1_bin0
bk4_g_f2_bin0
bk4_g_f3_bin0
bk4_g_f4_bin0
bk4_g_f5_bin0
bk4_g_f6_bin0
bk4_g_f7_bin0
bk4_g_f0_bin0
bk4_g_f1_bin0
bk4_g_f2_bin0
bk4_g_f3_bin0
bk4_g_f4_bin0
bk4_g_f5_bin0
bk4_g_f6_bin0
bk4_g_f7_bin0
bk4_h_f0_bin0
bk4_h_f1_bin0
bk4_h_f2_bin0
bk4_h_f3_bin0
bk4_h_f4_bin0
bk4_h_f5_bin0
bk4_h_f6_bin0
bk4_h_f7_bin0
bk4_h_f0_bin0
bk4_h_f1_bin0
bk4_h_f2_bin0
bk4_h_f3_bin0
bk4_h_f4_bin0
bk4_h_f5_bin0
bk4_h_f6_bin0
bk4_h_f7_bin0
bk5_g_f0_bin0
bk5_g_f1_bin0
bk5_g_f2_bin0
bk5_g_f3_bin0
bk5_g_f4_bin0
bk5_g_f5_bin0
bk5_g_f6_bin0
bk5_g_f7_bin0
bk5_g_f0_bin0
bk5_g_f1_bin0
bk5_g_f2_bin0
bk5_g_f3_bin0
bk5_g_f4_bin0
bk5_g_f5_bin0
bk5_g_f6_bin0
bk5_g_f7_bin0
bk5_h_f0_bin0
bk5_h_f1_bin0
bk5_h_f2_bin0
bk5_h_f3_bin0
bk5_h_f4_bin0
bk5_h_f5_bin0
bk5_h_f6_bin0
bk5_h_f7_bin0
bk5_h_f0_bin0
bk5_h_f1_bin0
bk5_h_f2_bin0
bk5_h_f3_bin0
bk5_h_f4_bin0
bk5_h_f5_bin0
bk5_h_f6_bin0
bk5_h_f7_bin0
bk6_g_f0_bin0
bk6_g_f1_bin0
bk6_g_f2_bin0
bk6_g_f3_bin0
bk6_g_f4_bin0
bk6_g_f5_bin0
bk6_g_f6_bin0
bk6_g_f7_bin0
bk6_g_f0_bin0
bk6_g_f1_bin0
bk6_g_f2_bin0
bk6_g_f3_bin0
bk6_g_f4_bin0
bk6_g_f5_bin0
bk6_g_f6_bin0
bk6_g_f7_bin0
bk6_h_f0_bin0
bk6_h_f1_bin0
bk6_h_f2_bin0
bk6_h_f3_bin0
bk6_h_f4_bin0
bk6_h_f5_bin0
bk6_h_f6_bin0
bk6_h_f7_bin0
bk6_h_f0_bin0
bk6_h_f1_bin0
bk6_h_f2_bin0
bk6_h_f3_bin0
bk6_h_f4_bin0
bk6_h_f5_bin0
bk6_h_f6_bin0
bk6_h_f7_bin0
bk7_g_f0_bin0
bk7_g_f1_bin0
bk7_g_f2_bin0
bk7_g_f3_bin0
bk7_g_f4_bin0
bk7_g_f5_bin0
bk7_g_f6_bin0
bk7_g_f7_bin0
bk7_g_f0_bin0
bk7_g_f1_bin0
bk7_g_f2_bin0
bk7_g_f3_bin0
bk7_g_f4_bin0
bk7_g_f5_bin0
bk7_g_f6_bin0
bk7_g_f7_bin0
bk7_h_f0_bin0
bk7_h_f1_bin0
bk7_h_f2_bin0
bk7_h_f3_bin0
bk7_h_f4_bin0
bk7_h_f5_bin0
bk7_h_f6_bin0
bk7_h_f7_bin0
bk7_h_f0_bin0
bk7_h_f1_bin0
bk7_h_f2_bin0
bk7_h_f3_bin0
bk7_h_f4_bin0
bk7_h_f5_bin0
bk7_h_f6_bin0
bk7_h_f7_bin0
...
...
bk0_g_f0_bin16
bk0_g_f1_bin16
bk0_g_f2_bin16
bk0_g_f3_bin16
bk0_g_f4_bin16
bk0_g_f5_bin16
bk0_g_f6_bin16
bk0_g_f7_bin16
bk0_g_f0_bin16
bk0_g_f1_bin16
bk0_g_f2_bin16
bk0_g_f3_bin16
bk0_g_f4_bin16
bk0_g_f5_bin16
bk0_g_f6_bin16
bk0_g_f7_bin16
bk0_h_f0_bin16
bk0_h_f1_bin16
bk0_h_f2_bin16
bk0_h_f3_bin16
bk0_h_f4_bin16
bk0_h_f5_bin16
bk0_h_f6_bin16
bk0_h_f7_bin16
bk0_h_f0_bin16
bk0_h_f1_bin16
bk0_h_f2_bin16
bk0_h_f3_bin16
bk0_h_f4_bin16
bk0_h_f5_bin16
bk0_h_f6_bin16
bk0_h_f7_bin16
bk1_g_f0_bin16
bk1_g_f1_bin16
bk1_g_f2_bin16
bk1_g_f3_bin16
bk1_g_f4_bin16
bk1_g_f5_bin16
bk1_g_f6_bin16
bk1_g_f7_bin16
bk1_g_f0_bin16
bk1_g_f1_bin16
bk1_g_f2_bin16
bk1_g_f3_bin16
bk1_g_f4_bin16
bk1_g_f5_bin16
bk1_g_f6_bin16
bk1_g_f7_bin16
bk1_h_f0_bin16
bk1_h_f1_bin16
bk1_h_f2_bin16
bk1_h_f3_bin16
bk1_h_f4_bin16
bk1_h_f5_bin16
bk1_h_f6_bin16
bk1_h_f7_bin16
bk1_h_f0_bin16
bk1_h_f1_bin16
bk1_h_f2_bin16
bk1_h_f3_bin16
bk1_h_f4_bin16
bk1_h_f5_bin16
bk1_h_f6_bin16
bk1_h_f7_bin16
bk2_g_f0_bin16
bk2_g_f1_bin16
bk2_g_f2_bin16
bk2_g_f3_bin16
bk2_g_f4_bin16
bk2_g_f5_bin16
bk2_g_f6_bin16
bk2_g_f7_bin16
bk2_g_f0_bin16
bk2_g_f1_bin16
bk2_g_f2_bin16
bk2_g_f3_bin16
bk2_g_f4_bin16
bk2_g_f5_bin16
bk2_g_f6_bin16
bk2_g_f7_bin16
bk2_h_f0_bin16
bk2_h_f1_bin16
bk2_h_f2_bin16
bk2_h_f3_bin16
bk2_h_f4_bin16
bk2_h_f5_bin16
bk2_h_f6_bin16
bk2_h_f7_bin16
bk2_h_f0_bin16
bk2_h_f1_bin16
bk2_h_f2_bin16
bk2_h_f3_bin16
bk2_h_f4_bin16
bk2_h_f5_bin16
bk2_h_f6_bin16
bk2_h_f7_bin16
bk3_g_f0_bin16
bk3_g_f1_bin16
bk3_g_f2_bin16
bk3_g_f3_bin16
bk3_g_f4_bin16
bk3_g_f5_bin16
bk3_g_f6_bin16
bk3_g_f7_bin16
bk3_g_f0_bin16
bk3_g_f1_bin16
bk3_g_f2_bin16
bk3_g_f3_bin16
bk3_g_f4_bin16
bk3_g_f5_bin16
bk3_g_f6_bin16
bk3_g_f7_bin16
bk3_h_f0_bin16
bk3_h_f1_bin16
bk3_h_f2_bin16
bk3_h_f3_bin16
bk3_h_f4_bin16
bk3_h_f5_bin16
bk3_h_f6_bin16
bk3_h_f7_bin16
bk3_h_f0_bin16
bk3_h_f1_bin16
bk3_h_f2_bin16
bk3_h_f3_bin16
bk3_h_f4_bin16
bk3_h_f5_bin16
bk3_h_f6_bin16
bk3_h_f7_bin16
bk4_g_f0_bin16
bk4_g_f1_bin16
bk4_g_f2_bin16
bk4_g_f3_bin16
bk4_g_f4_bin16
bk4_g_f5_bin16
bk4_g_f6_bin16
bk4_g_f7_bin16
bk4_g_f0_bin16
bk4_g_f1_bin16
bk4_g_f2_bin16
bk4_g_f3_bin16
bk4_g_f4_bin16
bk4_g_f5_bin16
bk4_g_f6_bin16
bk4_g_f7_bin16
bk4_h_f0_bin16
bk4_h_f1_bin16
bk4_h_f2_bin16
bk4_h_f3_bin16
bk4_h_f4_bin16
bk4_h_f5_bin16
bk4_h_f6_bin16
bk4_h_f7_bin16
bk4_h_f0_bin16
bk4_h_f1_bin16
bk4_h_f2_bin16
bk4_h_f3_bin16
bk4_h_f4_bin16
bk4_h_f5_bin16
bk4_h_f6_bin16
bk4_h_f7_bin16
bk5_g_f0_bin16
bk5_g_f1_bin16
bk5_g_f2_bin16
bk5_g_f3_bin16
bk5_g_f4_bin16
bk5_g_f5_bin16
bk5_g_f6_bin16
bk5_g_f7_bin16
bk5_g_f0_bin16
bk5_g_f1_bin16
bk5_g_f2_bin16
bk5_g_f3_bin16
bk5_g_f4_bin16
bk5_g_f5_bin16
bk5_g_f6_bin16
bk5_g_f7_bin16
bk5_h_f0_bin16
bk5_h_f1_bin16
bk5_h_f2_bin16
bk5_h_f3_bin16
bk5_h_f4_bin16
bk5_h_f5_bin16
bk5_h_f6_bin16
bk5_h_f7_bin16
bk5_h_f0_bin16
bk5_h_f1_bin16
bk5_h_f2_bin16
bk5_h_f3_bin16
bk5_h_f4_bin16
bk5_h_f5_bin16
bk5_h_f6_bin16
bk5_h_f7_bin16
bk6_g_f0_bin16
bk6_g_f1_bin16
bk6_g_f2_bin16
bk6_g_f3_bin16
bk6_g_f4_bin16
bk6_g_f5_bin16
bk6_g_f6_bin16
bk6_g_f7_bin16
bk6_g_f0_bin16
bk6_g_f1_bin16
bk6_g_f2_bin16
bk6_g_f3_bin16
bk6_g_f4_bin16
bk6_g_f5_bin16
bk6_g_f6_bin16
bk6_g_f7_bin16
bk6_h_f0_bin16
bk6_h_f1_bin16
bk6_h_f2_bin16
bk6_h_f3_bin16
bk6_h_f4_bin16
bk6_h_f5_bin16
bk6_h_f6_bin16
bk6_h_f7_bin16
bk6_h_f0_bin16
bk6_h_f1_bin16
bk6_h_f2_bin16
bk6_h_f3_bin16
bk6_h_f4_bin16
bk6_h_f5_bin16
bk6_h_f6_bin16
bk6_h_f7_bin16
bk7_g_f0_bin16
bk7_g_f1_bin16
bk7_g_f2_bin16
bk7_g_f3_bin16
bk7_g_f4_bin16
bk7_g_f5_bin16
bk7_g_f6_bin16
bk7_g_f7_bin16
bk7_g_f0_bin16
bk7_g_f1_bin16
bk7_g_f2_bin16
bk7_g_f3_bin16
bk7_g_f4_bin16
bk7_g_f5_bin16
bk7_g_f6_bin16
bk7_g_f7_bin16
bk7_h_f0_bin16
bk7_h_f1_bin16
bk7_h_f2_bin16
bk7_h_f3_bin16
bk7_h_f4_bin16
bk7_h_f5_bin16
bk7_h_f6_bin16
bk7_h_f7_bin16
bk7_h_f0_bin16
bk7_h_f1_bin16
bk7_h_f2_bin16
bk7_h_f3_bin16
bk7_h_f4_bin16
bk7_h_f5_bin16
bk7_h_f6_bin16
bk7_h_f7_bin16
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
*/
*/
...
@@ -333,7 +333,7 @@ __kernel void histogram16(__global const uchar4* feature_data_base,
...
@@ -333,7 +333,7 @@ __kernel void histogram16(__global const uchar4* feature_data_base,
uchar
is_hessian_first
=
(
ltid
>>
LOG2_DWORD_FEATURES
)
&
1
;
uchar
is_hessian_first
=
(
ltid
>>
LOG2_DWORD_FEATURES
)
&
1
;
//
thread
0-15
write
result
to
bank0,
16-31
to
bank1,
32-47
to
bank2,
48-63
to
bank3,
etc
//
thread
0-15
write
result
to
bank0,
16-31
to
bank1,
32-47
to
bank2,
48-63
to
bank3,
etc
ushort
bank
=
(
ltid
>>
(
LOG2_DWORD_FEATURES
+
1
))
&
BANK_MASK
;
ushort
bank
=
(
ltid
>>
(
LOG2_DWORD_FEATURES
+
1
))
&
BANK_MASK
;
ushort
group_feature
=
group_id
>>
POWER_FEATURE_WORKGROUPS
;
ushort
group_feature
=
group_id
>>
POWER_FEATURE_WORKGROUPS
;
//
each
2^POWER_FEATURE_WORKGROUPS
workgroups
process
on
one
feature
(
compile-time
constant
)
//
each
2^POWER_FEATURE_WORKGROUPS
workgroups
process
on
one
feature
(
compile-time
constant
)
//
feature_size
is
the
number
of
examples
per
feature
//
feature_size
is
the
number
of
examples
per
feature
...
@@ -615,12 +615,12 @@ R""()
...
@@ -615,12 +615,12 @@ R""()
)
""
)
""
R
""
()
R
""
()
*/
*/
#
if
ENABLE_ALL_FEATURES
==
0
#
if
ENABLE_ALL_FEATURES
==
0
//
restore
feature_mask
//
restore
feature_mask
feature_mask
=
feature_masks[group_feature]
;
feature_mask
=
feature_masks[group_feature]
;
#
endif
#
endif
//
now
reduce
the
4
banks
of
subhistograms
into
1
//
now
reduce
the
4
banks
of
subhistograms
into
1
acc_type
stat_val
=
0.0f
;
acc_type
stat_val
=
0.0f
;
uint
cnt_val
=
0
;
uint
cnt_val
=
0
;
...
@@ -644,7 +644,7 @@ R""()
...
@@ -644,7 +644,7 @@ R""()
}
}
}
}
#
endif
#
endif
//
now
thread
0
-
7
holds
feature
0
-
7
's
gradient
for
bin
0
and
counter
bin
0
//
now
thread
0
-
7
holds
feature
0
-
7
's
gradient
for
bin
0
and
counter
bin
0
//
now
thread
8
-
15
holds
feature
0
-
7
's
hessian
for
bin
0
and
counter
bin
1
//
now
thread
8
-
15
holds
feature
0
-
7
's
hessian
for
bin
0
and
counter
bin
1
//
now
thread
16-
23
holds
feature
0
-
7
's
gradient
for
bin
1
and
counter
bin
2
//
now
thread
16-
23
holds
feature
0
-
7
's
gradient
for
bin
1
and
counter
bin
2
...
@@ -664,7 +664,7 @@ R""()
...
@@ -664,7 +664,7 @@ R""()
//
thread
8
-
15
read
counters
stored
by
thread
0
-
7
//
thread
8
-
15
read
counters
stored
by
thread
0
-
7
//
thread
24-
31
read
counters
stored
by
thread
8
-
15
//
thread
24-
31
read
counters
stored
by
thread
8
-
15
//
thread
40-
47
read
counters
stored
by
thread
16-
23
,
etc
//
thread
40-
47
read
counters
stored
by
thread
16-
23
,
etc
stat_val
=
const_hessian
*
stat_val
=
const_hessian
*
cnt_hist[
((
ltid
-
DWORD_FEATURES
)
>>
(
LOG2_DWORD_FEATURES
+
1
))
*
DWORD_FEATURES
+
(
ltid
&
DWORD_FEATURES_MASK
)
]
;
cnt_hist[
((
ltid
-
DWORD_FEATURES
)
>>
(
LOG2_DWORD_FEATURES
+
1
))
*
DWORD_FEATURES
+
(
ltid
&
DWORD_FEATURES_MASK
)
]
;
}
}
else
{
else
{
...
@@ -688,12 +688,12 @@ R""()
...
@@ -688,12 +688,12 @@ R""()
h_f0_bin1
h_f1_bin1
h_f2_bin1
h_f3_bin1
h_f4_bin1
h_f5_bin1
h_f6_bin1
h_f7_bin1
h_f0_bin1
h_f1_bin1
h_f2_bin1
h_f3_bin1
h_f4_bin1
h_f5_bin1
h_f6_bin1
h_f7_bin1
...
...
...
...
g_f0_bin16
g_f1_bin16
g_f2_bin16
g_f3_bin16
g_f4_bin16
g_f5_bin16
g_f6_bin16
g_f7_bin16
g_f0_bin16
g_f1_bin16
g_f2_bin16
g_f3_bin16
g_f4_bin16
g_f5_bin16
g_f6_bin16
g_f7_bin16
h_f0_bin16
h_f1_bin16
h_f2_bin16
h_f3_bin16
h_f4_bin16
h_f5_bin16
h_f6_bin16
h_f7_bin16
h_f0_bin16
h_f1_bin16
h_f2_bin16
h_f3_bin16
h_f4_bin16
h_f5_bin16
h_f6_bin16
h_f7_bin16
c_f0_bin0
c_f1_bin0
c_f2_bin0
c_f3_bin0
c_f4_bin0
c_f5_bin0
c_f6_bin0
c_f7_bin0
c_f0_bin0
c_f1_bin0
c_f2_bin0
c_f3_bin0
c_f4_bin0
c_f5_bin0
c_f6_bin0
c_f7_bin0
c_f0_bin1
c_f1_bin1
c_f2_bin1
c_f3_bin1
c_f4_bin1
c_f5_bin1
c_f6_bin1
c_f7_bin1
c_f0_bin1
c_f1_bin1
c_f2_bin1
c_f3_bin1
c_f4_bin1
c_f5_bin1
c_f6_bin1
c_f7_bin1
...
...
c_f0_bin16
c_f1_bin16
c_f2_bin16
c_f3_bin16
c_f4_bin16
c_f5_bin16
c_f6_bin16
c_f7_bin16
c_f0_bin16
c_f1_bin16
c_f2_bin16
c_f3_bin16
c_f4_bin16
c_f5_bin16
c_f6_bin16
c_f7_bin16
*/
*/
//
if
there
is
only
one
workgroup
processing
this
feature4,
don
't
even
need
to
write
//
if
there
is
only
one
workgroup
processing
this
feature4,
don
't
even
need
to
write
uint
feature4_id
=
(
group_id
>>
POWER_FEATURE_WORKGROUPS
)
;
uint
feature4_id
=
(
group_id
>>
POWER_FEATURE_WORKGROUPS
)
;
...
@@ -704,7 +704,7 @@ R""()
...
@@ -704,7 +704,7 @@ R""()
output[0
*
DWORD_FEATURES
*
NUM_BINS
+
ltid]
=
stat_val
;
output[0
*
DWORD_FEATURES
*
NUM_BINS
+
ltid]
=
stat_val
;
barrier
(
CLK_LOCAL_MEM_FENCE
|
CLK_GLOBAL_MEM_FENCE
)
;
barrier
(
CLK_LOCAL_MEM_FENCE
|
CLK_GLOBAL_MEM_FENCE
)
;
mem_fence
(
CLK_GLOBAL_MEM_FENCE
)
;
mem_fence
(
CLK_GLOBAL_MEM_FENCE
)
;
//
To
avoid
the
cost
of
an
extra
reducing
kernel,
we
have
to
deal
with
some
//
To
avoid
the
cost
of
an
extra
reducing
kernel,
we
have
to
deal
with
some
//
gray
area
in
OpenCL.
We
want
the
last
work
group
that
process
this
feature
to
//
gray
area
in
OpenCL.
We
want
the
last
work
group
that
process
this
feature
to
//
make
the
final
reduction,
and
other
threads
will
just
quit.
//
make
the
final
reduction,
and
other
threads
will
just
quit.
//
This
requires
that
the
results
written
by
other
workgroups
available
to
the
//
This
requires
that
the
results
written
by
other
workgroups
available
to
the
...
@@ -750,13 +750,13 @@ R""()
...
@@ -750,13 +750,13 @@ R""()
#
endif
#
endif
//
locate
our
feature4
's
block
in
output
memory
//
locate
our
feature4
's
block
in
output
memory
uint
output_offset
=
(
feature4_id
<<
POWER_FEATURE_WORKGROUPS
)
;
uint
output_offset
=
(
feature4_id
<<
POWER_FEATURE_WORKGROUPS
)
;
__global
acc_type
const
*
restrict
feature4_subhists
=
__global
acc_type
const
*
restrict
feature4_subhists
=
(
__global
acc_type
*
)
output_buf
+
output_offset
*
DWORD_FEATURES
*
2
*
NUM_BINS
;
(
__global
acc_type
*
)
output_buf
+
output_offset
*
DWORD_FEATURES
*
2
*
NUM_BINS
;
//
skip
reading
the
data
already
in
local
memory
//
skip
reading
the
data
already
in
local
memory
uint
skip_id
=
group_id
^
output_offset
;
uint
skip_id
=
group_id
^
output_offset
;
//
locate
output
histogram
location
for
this
feature4
//
locate
output
histogram
location
for
this
feature4
__global
acc_type*
restrict
hist_buf
=
hist_buf_base
+
feature4_id
*
DWORD_FEATURES
*
2
*
NUM_BINS
;
__global
acc_type*
restrict
hist_buf
=
hist_buf_base
+
feature4_id
*
DWORD_FEATURES
*
2
*
NUM_BINS
;
within_kernel_reduction16x8
(
feature_mask,
feature4_subhists,
skip_id,
stat_val,
within_kernel_reduction16x8
(
feature_mask,
feature4_subhists,
skip_id,
stat_val,
1
<<
POWER_FEATURE_WORKGROUPS,
hist_buf,
(
__local
acc_type
*
)
shared_array
)
;
1
<<
POWER_FEATURE_WORKGROUPS,
hist_buf,
(
__local
acc_type
*
)
shared_array
)
;
}
}
}
}
...
...
src/treelearner/ocl/histogram256.cl
View file @
631e0a2a
...
@@ -47,12 +47,12 @@ typedef uint acc_int_type;
...
@@ -47,12 +47,12 @@ typedef uint acc_int_type;
#
endif
#
endif
#
define
LOCAL_MEM_SIZE
(
4
*
(
sizeof
(
uint
)
+
2
*
sizeof
(
acc_type
))
*
NUM_BINS
)
#
define
LOCAL_MEM_SIZE
(
4
*
(
sizeof
(
uint
)
+
2
*
sizeof
(
acc_type
))
*
NUM_BINS
)
//
unroll
the
atomic
operation
for
a
few
times.
Takes
more
code
space,
//
unroll
the
atomic
operation
for
a
few
times.
Takes
more
code
space,
//
but
compiler
can
generate
better
code
for
faster
atomics.
//
but
compiler
can
generate
better
code
for
faster
atomics.
#
define
UNROLL_ATOMIC
1
#
define
UNROLL_ATOMIC
1
//
Options
passed
by
compiler
at
run
time:
//
Options
passed
by
compiler
at
run
time:
//
IGNORE_INDICES
will
be
set
when
the
kernel
does
not
//
IGNORE_INDICES
will
be
set
when
the
kernel
does
not
//
#
define
IGNORE_INDICES
//
#
define
IGNORE_INDICES
//
#
define
POWER_FEATURE_WORKGROUPS
10
//
#
define
POWER_FEATURE_WORKGROUPS
10
...
@@ -137,7 +137,7 @@ R""()
...
@@ -137,7 +137,7 @@ R""()
//
this
function
will
be
called
by
histogram256
//
this
function
will
be
called
by
histogram256
//
we
have
one
sub-histogram
of
one
feature
in
local
memory,
and
need
to
read
others
//
we
have
one
sub-histogram
of
one
feature
in
local
memory,
and
need
to
read
others
void
within_kernel_reduction256x4
(
uchar4
feature_mask,
void
within_kernel_reduction256x4
(
uchar4
feature_mask,
__global
const
acc_type*
restrict
feature4_sub_hist,
__global
const
acc_type*
restrict
feature4_sub_hist,
const
uint
skip_id,
const
uint
skip_id,
const
uint
old_val_f0_cont_bin0,
const
uint
old_val_f0_cont_bin0,
const
ushort
num_sub_hist,
const
ushort
num_sub_hist,
...
@@ -314,12 +314,12 @@ R""()
...
@@ -314,12 +314,12 @@ R""()
*/
*/
__attribute__
((
reqd_work_group_size
(
LOCAL_SIZE_0,
1
,
1
)))
__attribute__
((
reqd_work_group_size
(
LOCAL_SIZE_0,
1
,
1
)))
#
if
USE_CONSTANT_BUF
==
1
#
if
USE_CONSTANT_BUF
==
1
__kernel
void
histogram256
(
__global
const
uchar4*
restrict
feature_data_base,
__kernel
void
histogram256
(
__global
const
uchar4*
restrict
feature_data_base,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
feature_size,
const
data_size_t
feature_size,
__constant
const
data_size_t*
restrict
data_indices
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
data_size_t*
restrict
data_indices
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
num_data,
const
data_size_t
num_data,
__constant
const
score_t*
restrict
ordered_gradients
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
score_t*
restrict
ordered_gradients
__attribute__
((
max_constant_size
(
65536
)))
,
#
if
CONST_HESSIAN
==
0
#
if
CONST_HESSIAN
==
0
__constant
const
score_t*
restrict
ordered_hessians
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
score_t*
restrict
ordered_hessians
__attribute__
((
max_constant_size
(
65536
)))
,
#
else
#
else
...
@@ -329,18 +329,18 @@ __kernel void histogram256(__global const uchar4* restrict feature_data_base,
...
@@ -329,18 +329,18 @@ __kernel void histogram256(__global const uchar4* restrict feature_data_base,
__global
volatile
int
*
sync_counters,
__global
volatile
int
*
sync_counters,
__global
acc_type*
restrict
hist_buf_base
)
{
__global
acc_type*
restrict
hist_buf_base
)
{
#
else
#
else
__kernel
void
histogram256
(
__global
const
uchar4*
feature_data_base,
__kernel
void
histogram256
(
__global
const
uchar4*
feature_data_base,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
feature_size,
const
data_size_t
feature_size,
__global
const
data_size_t*
data_indices,
__global
const
data_size_t*
data_indices,
const
data_size_t
num_data,
const
data_size_t
num_data,
__global
const
score_t*
ordered_gradients,
__global
const
score_t*
ordered_gradients,
#
if
CONST_HESSIAN
==
0
#
if
CONST_HESSIAN
==
0
__global
const
score_t*
ordered_hessians,
__global
const
score_t*
ordered_hessians,
#
else
#
else
const
score_t
const_hessian,
const
score_t
const_hessian,
#
endif
#
endif
__global
char*
restrict
output_buf,
__global
char*
restrict
output_buf,
__global
volatile
int
*
sync_counters,
__global
volatile
int
*
sync_counters,
__global
acc_type*
restrict
hist_buf_base
)
{
__global
acc_type*
restrict
hist_buf_base
)
{
#
endif
#
endif
...
@@ -363,20 +363,20 @@ __kernel void histogram256(__global const uchar4* feature_data_base,
...
@@ -363,20 +363,20 @@ __kernel void histogram256(__global const uchar4* feature_data_base,
//
gradient/hessian
histograms
//
gradient/hessian
histograms
//
assume
this
starts
at
32
*
4
=
128-byte
boundary
//
assume
this
starts
at
32
*
4
=
128-byte
boundary
//
total
size:
2
*
4
*
256
*
size_of
(
float
)
=
8
KB
//
total
size:
2
*
4
*
256
*
size_of
(
float
)
=
8
KB
//
organization:
each
feature/grad/hessian
is
at
a
different
bank,
//
organization:
each
feature/grad/hessian
is
at
a
different
bank,
//
as
independent
of
the
feature
value
as
possible
//
as
independent
of
the
feature
value
as
possible
__local
acc_type
*
gh_hist
=
(
__local
acc_type
*
)
shared_array
;
__local
acc_type
*
gh_hist
=
(
__local
acc_type
*
)
shared_array
;
//
counter
histogram
//
counter
histogram
//
total
size:
4
*
256
*
size_of
(
uint
)
=
4
KB
//
total
size:
4
*
256
*
size_of
(
uint
)
=
4
KB
#
if
CONST_HESSIAN
==
1
#
if
CONST_HESSIAN
==
1
__local
uint
*
cnt_hist
=
(
__local
uint
*
)(
gh_hist
+
2
*
4
*
NUM_BINS
)
;
__local
uint
*
cnt_hist
=
(
__local
uint
*
)(
gh_hist
+
2
*
4
*
NUM_BINS
)
;
#
endif
#
endif
//
thread
0
,
1
,
2
,
3
compute
histograms
for
gradients
first
//
thread
0
,
1
,
2
,
3
compute
histograms
for
gradients
first
//
thread
4
,
5
,
6
,
7
compute
histograms
for
Hessians
first
//
thread
4
,
5
,
6
,
7
compute
histograms
for
Hessians
first
//
etc.
//
etc.
uchar
is_hessian_first
=
(
ltid
>>
2
)
&
1
;
uchar
is_hessian_first
=
(
ltid
>>
2
)
&
1
;
ushort
group_feature
=
group_id
>>
POWER_FEATURE_WORKGROUPS
;
ushort
group_feature
=
group_id
>>
POWER_FEATURE_WORKGROUPS
;
//
each
2^POWER_FEATURE_WORKGROUPS
workgroups
process
on
one
feature
(
compile-time
constant
)
//
each
2^POWER_FEATURE_WORKGROUPS
workgroups
process
on
one
feature
(
compile-time
constant
)
//
feature_size
is
the
number
of
examples
per
feature
//
feature_size
is
the
number
of
examples
per
feature
...
@@ -725,7 +725,7 @@ R""()
...
@@ -725,7 +725,7 @@ R""()
}
}
barrier
(
CLK_LOCAL_MEM_FENCE
|
CLK_GLOBAL_MEM_FENCE
)
;
barrier
(
CLK_LOCAL_MEM_FENCE
|
CLK_GLOBAL_MEM_FENCE
)
;
mem_fence
(
CLK_GLOBAL_MEM_FENCE
)
;
mem_fence
(
CLK_GLOBAL_MEM_FENCE
)
;
//
To
avoid
the
cost
of
an
extra
reducing
kernel,
we
have
to
deal
with
some
//
To
avoid
the
cost
of
an
extra
reducing
kernel,
we
have
to
deal
with
some
//
gray
area
in
OpenCL.
We
want
the
last
work
group
that
process
this
feature
to
//
gray
area
in
OpenCL.
We
want
the
last
work
group
that
process
this
feature
to
//
make
the
final
reduction,
and
other
threads
will
just
quit.
//
make
the
final
reduction,
and
other
threads
will
just
quit.
//
This
requires
that
the
results
written
by
other
workgroups
available
to
the
//
This
requires
that
the
results
written
by
other
workgroups
available
to
the
...
@@ -773,15 +773,15 @@ R""()
...
@@ -773,15 +773,15 @@ R""()
#
endif
#
endif
//
locate
our
feature4
's
block
in
output
memory
//
locate
our
feature4
's
block
in
output
memory
uint
output_offset
=
(
feature4_id
<<
POWER_FEATURE_WORKGROUPS
)
;
uint
output_offset
=
(
feature4_id
<<
POWER_FEATURE_WORKGROUPS
)
;
__global
acc_type
const
*
restrict
feature4_subhists
=
__global
acc_type
const
*
restrict
feature4_subhists
=
(
__global
acc_type
*
)
output_buf
+
output_offset
*
4
*
2
*
NUM_BINS
;
(
__global
acc_type
*
)
output_buf
+
output_offset
*
4
*
2
*
NUM_BINS
;
//
skip
reading
the
data
already
in
local
memory
//
skip
reading
the
data
already
in
local
memory
uint
skip_id
=
group_id
^
output_offset
;
uint
skip_id
=
group_id
^
output_offset
;
//
locate
output
histogram
location
for
this
feature4
//
locate
output
histogram
location
for
this
feature4
__global
acc_type*
restrict
hist_buf
=
hist_buf_base
+
feature4_id
*
4
*
2
*
NUM_BINS
;
__global
acc_type*
restrict
hist_buf
=
hist_buf_base
+
feature4_id
*
4
*
2
*
NUM_BINS
;
within_kernel_reduction256x4
(
feature_mask,
feature4_subhists,
skip_id,
old_val,
1
<<
POWER_FEATURE_WORKGROUPS,
within_kernel_reduction256x4
(
feature_mask,
feature4_subhists,
skip_id,
old_val,
1
<<
POWER_FEATURE_WORKGROUPS,
hist_buf,
(
__local
acc_type
*
)
shared_array
)
;
hist_buf,
(
__local
acc_type
*
)
shared_array
)
;
//
if
(
ltid
==
0
)
//
if
(
ltid
==
0
)
//
printf
(
"workgroup %d reduction done, %g %g %g %g %g %g %g %g\n"
,
group_id,
hist_buf[0],
hist_buf[3*NUM_BINS],
hist_buf[2*3*NUM_BINS],
hist_buf[3*3*NUM_BINS],
hist_buf[1],
hist_buf[3*NUM_BINS+1],
hist_buf[2*3*NUM_BINS+1],
hist_buf[3*3*NUM_BINS+1]
)
;
//
printf
(
"workgroup %d reduction done, %g %g %g %g %g %g %g %g\n"
,
group_id,
hist_buf[0],
hist_buf[3*NUM_BINS],
hist_buf[2*3*NUM_BINS],
hist_buf[3*3*NUM_BINS],
hist_buf[1],
hist_buf[3*NUM_BINS+1],
hist_buf[2*3*NUM_BINS+1],
hist_buf[3*3*NUM_BINS+1]
)
;
}
}
}
}
...
...
src/treelearner/ocl/histogram64.cl
View file @
631e0a2a
...
@@ -65,12 +65,12 @@ typedef uint acc_int_type;
...
@@ -65,12 +65,12 @@ typedef uint acc_int_type;
//
local
memory
size
in
bytes
//
local
memory
size
in
bytes
#
define
LOCAL_MEM_SIZE
(
4
*
(
sizeof
(
uint
)
+
2
*
sizeof
(
acc_type
))
*
NUM_BINS
*
NUM_BANKS
)
#
define
LOCAL_MEM_SIZE
(
4
*
(
sizeof
(
uint
)
+
2
*
sizeof
(
acc_type
))
*
NUM_BINS
*
NUM_BANKS
)
//
unroll
the
atomic
operation
for
a
few
times.
Takes
more
code
space,
//
unroll
the
atomic
operation
for
a
few
times.
Takes
more
code
space,
//
but
compiler
can
generate
better
code
for
faster
atomics.
//
but
compiler
can
generate
better
code
for
faster
atomics.
#
define
UNROLL_ATOMIC
1
#
define
UNROLL_ATOMIC
1
//
Options
passed
by
compiler
at
run
time:
//
Options
passed
by
compiler
at
run
time:
//
IGNORE_INDICES
will
be
set
when
the
kernel
does
not
//
IGNORE_INDICES
will
be
set
when
the
kernel
does
not
//
#
define
IGNORE_INDICES
//
#
define
IGNORE_INDICES
//
#
define
POWER_FEATURE_WORKGROUPS
10
//
#
define
POWER_FEATURE_WORKGROUPS
10
...
@@ -155,7 +155,7 @@ R""()
...
@@ -155,7 +155,7 @@ R""()
//
this
function
will
be
called
by
histogram64
//
this
function
will
be
called
by
histogram64
//
we
have
one
sub-histogram
of
one
feature
in
registers,
and
need
to
read
others
//
we
have
one
sub-histogram
of
one
feature
in
registers,
and
need
to
read
others
void
within_kernel_reduction64x4
(
uchar4
feature_mask,
void
within_kernel_reduction64x4
(
uchar4
feature_mask,
__global
const
acc_type*
restrict
feature4_sub_hist,
__global
const
acc_type*
restrict
feature4_sub_hist,
const
uint
skip_id,
const
uint
skip_id,
acc_type
g_val,
acc_type
h_val,
acc_type
g_val,
acc_type
h_val,
const
ushort
num_sub_hist,
const
ushort
num_sub_hist,
...
@@ -166,7 +166,7 @@ void within_kernel_reduction64x4(uchar4 feature_mask,
...
@@ -166,7 +166,7 @@ void within_kernel_reduction64x4(uchar4 feature_mask,
ushort
feature_id
=
ltid
&
3
; // range 0 - 4
ushort
feature_id
=
ltid
&
3
; // range 0 - 4
const
ushort
bin_id
=
ltid
>>
2
; // range 0 - 63W
const
ushort
bin_id
=
ltid
>>
2
; // range 0 - 63W
ushort
i
;
ushort
i
;
#
if
POWER_FEATURE_WORKGROUPS
!=
0
#
if
POWER_FEATURE_WORKGROUPS
!=
0
//
if
there
is
only
1
work
group,
no
need
to
do
the
reduction
//
if
there
is
only
1
work
group,
no
need
to
do
the
reduction
//
add
all
sub-histograms
for
4
features
//
add
all
sub-histograms
for
4
features
__global
const
acc_type*
restrict
p
=
feature4_sub_hist
+
ltid
;
__global
const
acc_type*
restrict
p
=
feature4_sub_hist
+
ltid
;
...
@@ -212,12 +212,12 @@ R""()
...
@@ -212,12 +212,12 @@ R""()
*/
*/
__attribute__
((
reqd_work_group_size
(
LOCAL_SIZE_0,
1
,
1
)))
__attribute__
((
reqd_work_group_size
(
LOCAL_SIZE_0,
1
,
1
)))
#
if
USE_CONSTANT_BUF
==
1
#
if
USE_CONSTANT_BUF
==
1
__kernel
void
histogram64
(
__global
const
uchar4*
restrict
feature_data_base,
__kernel
void
histogram64
(
__global
const
uchar4*
restrict
feature_data_base,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
feature_size,
const
data_size_t
feature_size,
__constant
const
data_size_t*
restrict
data_indices
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
data_size_t*
restrict
data_indices
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
num_data,
const
data_size_t
num_data,
__constant
const
score_t*
restrict
ordered_gradients
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
score_t*
restrict
ordered_gradients
__attribute__
((
max_constant_size
(
65536
)))
,
#
if
CONST_HESSIAN
==
0
#
if
CONST_HESSIAN
==
0
__constant
const
score_t*
restrict
ordered_hessians
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
score_t*
restrict
ordered_hessians
__attribute__
((
max_constant_size
(
65536
)))
,
#
else
#
else
...
@@ -227,18 +227,18 @@ __kernel void histogram64(__global const uchar4* restrict feature_data_base,
...
@@ -227,18 +227,18 @@ __kernel void histogram64(__global const uchar4* restrict feature_data_base,
__global
volatile
int
*
sync_counters,
__global
volatile
int
*
sync_counters,
__global
acc_type*
restrict
hist_buf_base
)
{
__global
acc_type*
restrict
hist_buf_base
)
{
#
else
#
else
__kernel
void
histogram64
(
__global
const
uchar4*
feature_data_base,
__kernel
void
histogram64
(
__global
const
uchar4*
feature_data_base,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
__constant
const
uchar4*
restrict
feature_masks
__attribute__
((
max_constant_size
(
65536
)))
,
const
data_size_t
feature_size,
const
data_size_t
feature_size,
__global
const
data_size_t*
data_indices,
__global
const
data_size_t*
data_indices,
const
data_size_t
num_data,
const
data_size_t
num_data,
__global
const
score_t*
ordered_gradients,
__global
const
score_t*
ordered_gradients,
#
if
CONST_HESSIAN
==
0
#
if
CONST_HESSIAN
==
0
__global
const
score_t*
ordered_hessians,
__global
const
score_t*
ordered_hessians,
#
else
#
else
const
score_t
const_hessian,
const
score_t
const_hessian,
#
endif
#
endif
__global
char*
restrict
output_buf,
__global
char*
restrict
output_buf,
__global
volatile
int
*
sync_counters,
__global
volatile
int
*
sync_counters,
__global
acc_type*
restrict
hist_buf_base
)
{
__global
acc_type*
restrict
hist_buf_base
)
{
#
endif
#
endif
...
@@ -313,7 +313,7 @@ __kernel void histogram64(__global const uchar4* feature_data_base,
...
@@ -313,7 +313,7 @@ __kernel void histogram64(__global const uchar4* feature_data_base,
uchar
is_hessian_first
=
(
ltid
>>
2
)
&
1
;
uchar
is_hessian_first
=
(
ltid
>>
2
)
&
1
;
//
thread
0-7
write
result
to
bank0,
8-15
to
bank1,
16-23
to
bank2,
24-31
to
bank3
//
thread
0-7
write
result
to
bank0,
8-15
to
bank1,
16-23
to
bank2,
24-31
to
bank3
ushort
bank
=
(
ltid
>>
3
)
&
BANK_MASK
;
ushort
bank
=
(
ltid
>>
3
)
&
BANK_MASK
;
ushort
group_feature
=
group_id
>>
POWER_FEATURE_WORKGROUPS
;
ushort
group_feature
=
group_id
>>
POWER_FEATURE_WORKGROUPS
;
//
each
2^POWER_FEATURE_WORKGROUPS
workgroups
process
on
one
feature
(
compile-time
constant
)
//
each
2^POWER_FEATURE_WORKGROUPS
workgroups
process
on
one
feature
(
compile-time
constant
)
//
feature_size
is
the
number
of
examples
per
feature
//
feature_size
is
the
number
of
examples
per
feature
...
@@ -582,7 +582,7 @@ R""()
...
@@ -582,7 +582,7 @@ R""()
atomic_local_add_f
(
gh_hist
+
addr2,
s0_stat2
)
;
atomic_local_add_f
(
gh_hist
+
addr2,
s0_stat2
)
;
#
endif
#
endif
barrier
(
CLK_LOCAL_MEM_FENCE
)
;
barrier
(
CLK_LOCAL_MEM_FENCE
)
;
/*
Makes
MSVC
happy
with
long
string
literal
/*
Makes
MSVC
happy
with
long
string
literal
)
""
)
""
R
""
()
R
""
()
...
@@ -591,7 +591,7 @@ R""()
...
@@ -591,7 +591,7 @@ R""()
//
restore
feature_mask
//
restore
feature_mask
feature_mask
=
feature_masks[group_feature]
;
feature_mask
=
feature_masks[group_feature]
;
#
endif
#
endif
//
now
reduce
the
4
banks
of
subhistograms
into
1
//
now
reduce
the
4
banks
of
subhistograms
into
1
/*
memory
layout
of
gh_hist:
/*
memory
layout
of
gh_hist:
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
...
@@ -680,7 +680,7 @@ R""()
...
@@ -680,7 +680,7 @@ R""()
output[1
*
4
*
NUM_BINS
+
ltid]
=
h_val
;
output[1
*
4
*
NUM_BINS
+
ltid]
=
h_val
;
barrier
(
CLK_LOCAL_MEM_FENCE
|
CLK_GLOBAL_MEM_FENCE
)
;
barrier
(
CLK_LOCAL_MEM_FENCE
|
CLK_GLOBAL_MEM_FENCE
)
;
mem_fence
(
CLK_GLOBAL_MEM_FENCE
)
;
mem_fence
(
CLK_GLOBAL_MEM_FENCE
)
;
//
To
avoid
the
cost
of
an
extra
reducing
kernel,
we
have
to
deal
with
some
//
To
avoid
the
cost
of
an
extra
reducing
kernel,
we
have
to
deal
with
some
//
gray
area
in
OpenCL.
We
want
the
last
work
group
that
process
this
feature
to
//
gray
area
in
OpenCL.
We
want
the
last
work
group
that
process
this
feature
to
//
make
the
final
reduction,
and
other
threads
will
just
quit.
//
make
the
final
reduction,
and
other
threads
will
just
quit.
//
This
requires
that
the
results
written
by
other
workgroups
available
to
the
//
This
requires
that
the
results
written
by
other
workgroups
available
to
the
...
@@ -726,13 +726,13 @@ R""()
...
@@ -726,13 +726,13 @@ R""()
#
endif
#
endif
//
locate
our
feature4
's
block
in
output
memory
//
locate
our
feature4
's
block
in
output
memory
uint
output_offset
=
(
feature4_id
<<
POWER_FEATURE_WORKGROUPS
)
;
uint
output_offset
=
(
feature4_id
<<
POWER_FEATURE_WORKGROUPS
)
;
__global
acc_type
const
*
restrict
feature4_subhists
=
__global
acc_type
const
*
restrict
feature4_subhists
=
(
__global
acc_type
*
)
output_buf
+
output_offset
*
4
*
2
*
NUM_BINS
;
(
__global
acc_type
*
)
output_buf
+
output_offset
*
4
*
2
*
NUM_BINS
;
//
skip
reading
the
data
already
in
local
memory
//
skip
reading
the
data
already
in
local
memory
uint
skip_id
=
group_id
^
output_offset
;
uint
skip_id
=
group_id
^
output_offset
;
//
locate
output
histogram
location
for
this
feature4
//
locate
output
histogram
location
for
this
feature4
__global
acc_type*
restrict
hist_buf
=
hist_buf_base
+
feature4_id
*
4
*
2
*
NUM_BINS
;
__global
acc_type*
restrict
hist_buf
=
hist_buf_base
+
feature4_id
*
4
*
2
*
NUM_BINS
;
within_kernel_reduction64x4
(
feature_mask,
feature4_subhists,
skip_id,
g_val,
h_val,
within_kernel_reduction64x4
(
feature_mask,
feature4_subhists,
skip_id,
g_val,
h_val,
1
<<
POWER_FEATURE_WORKGROUPS,
hist_buf,
(
__local
acc_type
*
)
shared_array
)
;
1
<<
POWER_FEATURE_WORKGROUPS,
hist_buf,
(
__local
acc_type
*
)
shared_array
)
;
}
}
}
}
...
...
swig/StringArray.i
View file @
631e0a2a
...
@@ -6,7 +6,7 @@
...
@@ -6,7 +6,7 @@
*/
*/
/**
/**
* This wraps the StringArray.hpp class for SWIG usage,
* This wraps the StringArray.hpp class for SWIG usage,
* adding the basic C-style wrappers needed to make it
* adding the basic C-style wrappers needed to make it
* usable for the users of the low-level lightgbmJNI API.
* usable for the users of the low-level lightgbmJNI API.
*/
*/
...
@@ -23,7 +23,7 @@
...
@@ -23,7 +23,7 @@
/**
/**
* @brief Creates a new StringArray and returns its handle.
* @brief Creates a new StringArray and returns its handle.
*
*
* @param num_strings number of strings to store.
* @param num_strings number of strings to store.
* @param string_size the maximum number of characters that can be stored in each string.
* @param string_size the maximum number of characters that can be stored in each string.
* @return StringArrayHandle or nullptr in case of allocation failure.
* @return StringArrayHandle or nullptr in case of allocation failure.
...
@@ -38,7 +38,7 @@
...
@@ -38,7 +38,7 @@
/**
/**
* @brief Free the StringArray object.
* @brief Free the StringArray object.
*
*
* @param handle StringArray handle.
* @param handle StringArray handle.
*/
*/
void
StringArrayHandle_free
(
StringArrayHandle
handle
)
void
StringArrayHandle_free
(
StringArrayHandle
handle
)
...
@@ -49,7 +49,7 @@
...
@@ -49,7 +49,7 @@
/**
/**
* @brief Return the raw pointer to the array of strings.
* @brief Return the raw pointer to the array of strings.
* Wrapped in Java into String[] automatically.
* Wrapped in Java into String[] automatically.
*
*
* @param handle StringArray handle.
* @param handle StringArray handle.
* @return Raw pointer to the string array which `various.i` maps to String[].
* @return Raw pointer to the string array which `various.i` maps to String[].
*/
*/
...
@@ -60,7 +60,7 @@
...
@@ -60,7 +60,7 @@
/**
/**
* For the end user to extract a specific string from the StringArray object.
* For the end user to extract a specific string from the StringArray object.
*
*
* @param handle StringArray handle.
* @param handle StringArray handle.
* @param index index of the string to retrieve from the array.
* @param index index of the string to retrieve from the array.
* @return raw pointer to string at index, or nullptr if out of bounds.
* @return raw pointer to string at index, or nullptr if out of bounds.
...
@@ -72,7 +72,7 @@
...
@@ -72,7 +72,7 @@
/**
/**
* @brief Replaces one string of the array at index with the new content.
* @brief Replaces one string of the array at index with the new content.
*
*
* @param handle StringArray handle.
* @param handle StringArray handle.
* @param index Index of the string to replace
* @param index Index of the string to replace
* @param new_content The content to replace
* @param new_content The content to replace
...
@@ -85,7 +85,7 @@
...
@@ -85,7 +85,7 @@
/**
/**
* @brief Retrieve the number of strings in the StringArray.
* @brief Retrieve the number of strings in the StringArray.
*
*
* @param handle StringArray handle.
* @param handle StringArray handle.
* @return number of strings that the array stores.
* @return number of strings that the array stores.
*/
*/
...
...
swig/pointer_manipulation.i
View file @
631e0a2a
...
@@ -6,11 +6,11 @@
...
@@ -6,11 +6,11 @@
* This SWIG interface extension provides support to
* This SWIG interface extension provides support to
* the pointer manipulation methods present in the standard
* the pointer manipulation methods present in the standard
* SWIG wrappers, but with support for larger arrays.
* SWIG wrappers, but with support for larger arrays.
*
*
* SWIG provides this in https://github.com/swig/swig/blob/master/Lib/carrays.i
* SWIG provides this in https://github.com/swig/swig/blob/master/Lib/carrays.i
* but the standard methods only provide arrays with up to
* but the standard methods only provide arrays with up to
* max(int32_t) elements.
* max(int32_t) elements.
*
*
* The `long_array_functions` wrappers extend this
* The `long_array_functions` wrappers extend this
* to arrays of size max(int64_t) instead of max(int32_t).
* to arrays of size max(int64_t) instead of max(int32_t).
*/
*/
...
@@ -103,7 +103,7 @@ void delete_##NAME(TYPE *ary);
...
@@ -103,7 +103,7 @@ void delete_##NAME(TYPE *ary);
TYPE
NAME
##
_getitem
(
TYPE
*
ary
,
int64_t
index
)
;
TYPE
NAME
##
_getitem
(
TYPE
*
ary
,
int64_t
index
)
;
void
NAME
##
_setitem
(
TYPE
*
ary
,
int64_t
index
,
TYPE
value
)
;
void
NAME
##
_setitem
(
TYPE
*
ary
,
int64_t
index
,
TYPE
value
)
;
%
enddef
%
enddef
/* Custom template for arrays of pointers */
/* Custom template for arrays of pointers */
%
define
%
ptr_array_functions
(
TYPE
,
NAME
)
%
define
%
ptr_array_functions
(
TYPE
,
NAME
)
...
...
Prev
1
2
3
Next
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment