gradient_feature_selector.rst 4.67 KB
Newer Older
1
2
3
GradientFeatureSelector
-----------------------

4
The algorithm in GradientFeatureSelector comes from `Feature Gradients: Scalable Feature Selection via Discrete Relaxation <https://arxiv.org/pdf/1908.10382.pdf>`__.
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

GradientFeatureSelector, a gradient-based search algorithm
for feature selection. 

1) This approach extends a recent result on the estimation of
learnability in the sublinear data regime by showing that the calculation can be performed iteratively (i.e., in mini-batches) and in **linear time and space** with respect to both the number of features D and the sample size N. 

2) This, along with a discrete-to-continuous relaxation of the search domain, allows for an **efficient, gradient-based** search algorithm among feature subsets for very **large datasets**.

3) Crucially, this algorithm is capable of finding **higher-order correlations** between features and targets for both the N > D and N < D regimes, as opposed to approaches that do not consider such interactions and/or only consider one regime.

Usage
^^^^^

.. code-block:: python

kvartet's avatar
kvartet committed
21
   from nni.algorithms.feature_engineering.gradient_selector import FeatureGradientSelector
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

   # load data
   ...
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

   # initlize a selector
   fgs = FeatureGradientSelector(n_features=10)
   # fit data
   fgs.fit(X_train, y_train)
   # get improtant features
   # will return the index with important feature here.
   print(fgs.get_selected_features())

   ...

And you could reference the examples in ``/examples/feature_engineering/gradient_feature_selector/``\ , too.

**Parameters of class FeatureGradientSelector constructor**


* 
  **order** (int, optional, default = 4) - What order of interactions to include. Higher orders may be more accurate but increase the run time. 12 is the maximum allowed order.

* 
46
  **penalty** (int, optional, default = 1) - Constant that multiplies the regularization term.
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92

* 
  **n_features** (int, optional, default = None) - If None, will automatically choose number of features based on search. Otherwise, the number of top features to select.

* 
  **max_features** (int, optional, default = None) - If not None, will use the 'elbow method' to determine the number of features with max_features as the upper limit.

* 
  **learning_rate** (float, optional, default = 1e-1) - learning rate

* 
  **init** (*zero, on, off, onhigh, offhigh, or sklearn, optional, default = zero*\ ) - How to initialize the vector of scores. 'zero' is the default.

* 
  **n_epochs** (int, optional, default = 1) - number of epochs to run

* 
  **shuffle** (bool, optional, default = True) - Shuffle "rows" prior to an epoch.

* 
  **batch_size** (int, optional, default = 1000) - Nnumber of "rows" to process at a time.

* 
  **target_batch_size** (int, optional, default = 1000) - Number of "rows" to accumulate gradients over. Useful when many rows will not fit into memory but are needed for accurate estimation.

* 
  **classification** (bool, optional, default = True) - If True, problem is classification, else regression.

* 
  **ordinal** (bool, optional, default = True) - If True, problem is ordinal classification. Requires classification to be True.

* 
  **balanced** (bool, optional, default = True) - If true, each class is weighted equally in optimization, otherwise weighted is done via support of each class. Requires classification to be True.

* 
  **prerocess** (str, optional, default = 'zscore') - 'zscore' which refers to centering and normalizing data to unit variance or 'center' which only centers the data to 0 mean.

* 
  **soft_grouping** (bool, optional, default = True) - If True, groups represent features that come from the same source. Used to encourage sparsity of groups and features within groups.

* 
  **verbose** (int, optional, default = 0) - Controls the verbosity when fitting. Set to 0 for no printing 1 or higher for printing every verbose number of gradient steps.

* 
  **device** (str, optional, default = 'cpu') - 'cpu' to run on CPU and 'cuda' to run on GPU. Runs much faster on GPU

93
**Requirement of fit FuncArgs**
94
95
96


* 
97
  **X** (array-like, require) - The training input samples which shape = [n_samples, n_features]. `np.ndarry` recommended.
98
99

* 
100
  **y** (array-like, require) - The target values (class labels in classification, real numbers in regression) which shape = [n_samples]. `np.ndarry` recommended.
101
102
103
104

* 
  **groups** (array-like, optional, default = None) - Groups of columns that must be selected as a unit. e.g. [0, 0, 1, 2] specifies the first two columns are part of a group. Which shape is [n_features].

105
**Requirement of get_selected_features FuncArgs**
106
107

 For now, the ``get_selected_features`` function has no parameters.