We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute.
We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI, it's still in the experiment phase which might evolve based on user feedback. We'd like to invite you to use, feedback and even contribute.
For now, we support the following feature selector:
# will return the index with important feature here.
print(fgs.get_selected_features(...))
...
```
When using the built-in Selector, you first need to `import` a feature selector, and `initialize` it. You could call the function `fit` in the selector to pass the data to the selector. After that, you could use `get_seleteced_features` to get important features. The function parameters in different selectors might be different, so you need to check the docs before using it.
# How to customize?
NNI provides _state-of-the-art_ feature selector algorithm in the builtin-selector. NNI also supports to build a feature selector by yourself.
If you want to implement a customized feature selector, you need to:
1. Inherit the base FeatureSelector class
1. Implement _fit_ and _get_selected_features_ function
The training input samples, which shape is [n_samples, n_features].
y: array-like numpy matrix
The target values (class labels in classification, real numbers in regression). Which shape is [n_samples].
"""
self.X=X
self.y=y
...
defget_selected_features(self):
"""
Get important feature
Returns
-------
list :
Return the index of the important feature.
"""
...
returnself.selected_features_
...
```
**3. Integrate with Sklearn**
`sklearn.pipeline.Pipeline` can connect models in series, such as feature selector, normalization, and classification/regression to form a typical machine learning problem workflow.
The following step could help us to better integrate with sklearn, which means we could treat the customized feature selector as a mudule of the pipeline.
1. Inherit the calss _sklearn.base.BaseEstimator_
1. Implement _get_params_ and _set_params_ function in _BaseEstimator_
1. Inherit the class _sklearn.feature_selection.base.SelectorMixin_
1. Implement _get_support_, _transform_ and _inverse_transform_ Function in _SelectorMixin_
Here is an example:
**1. Inherit the BaseEstimator Class and its Function**
Get a mask, or integer index, of the features selected.
Parameters
----------
indices : bool
Default False. If True, the return value will be an array of integers, rather than a boolean mask.
Returns
-------
list :
returns support: An index that selects the retained features from a feature vector.
If indices are False, this is a boolean array of shape [# input features], in which an element is True iff its corresponding feature is selected for retention.
If indices are True, this is an integer array of shape [# output features] whose values
are indices into the input feature vector.
"""
...
returnmask
deftransform(self,X):
"""Reduce X to the selected features.
Parameters
----------
X : array
which shape is [n_samples, n_features]
Returns
-------
X_r : array
which shape is [n_samples, n_selected_features]
The input samples with only the selected features.
"""
...
returnX_r
definverse_transform(self,X):
"""
Reverse the transformation operation
Parameters
----------
X : array
shape is [n_samples, n_selected_features]
Returns
-------
X_r : array
shape is [n_samples, n_original_features]
"""
...
returnX_r
```
After integrating with Sklearn, we could use the feature selector as follows:
`Baseline` means without any feature selection, we directly pass the data to LogisticRegression. For this benchmark, we only use 10% data from the train as test data.