"git@developer.sourcefind.cn:OpenDAS/tilelang.git" did not exist on "95e3b5a7160da6679e9507602f801866c3672e6b"
Unverified Commit 8ac61b77 authored by xuehui's avatar xuehui Committed by GitHub
Browse files

Gradient Feature Selection (Ready to review) (#1734)

* first update

* update by folder naming

* add gradient feature selection example

* add examples

* delete unused example

* update by pylint

* update by pylint

* update learnability by info from pylint

* fix pylint in fgtrain

* update fginitlize and learnability by pylint

* update by evan's response

* add gbdt_selector

* update gbdt_selector

* refine the example folder structure

* update feature engineering doc

* update docs of feature selector

* update doc of gradientfeature selector

* update docs of GBDTSelector

* update examples of gradientfeature selector

* update folder structure

* update docs by folder structure

* test pylint

* test

* update by pylint

* update by pylint

* update docs and remove some dependency

* remove unused code

* update by comments

* update by comments

* move the feature selection example path

* delete unused dependency
parent ae36373c
## GBDTSelector
GBDTSelector is based on [LightGBM](https://github.com/microsoft/LightGBM), which is a gradient boosting framework that uses tree-based learning algorithms.
When passing the data into the GBDT model, the model will construct the boosting tree. And the feature importance comes from the score in construction, which indicates how useful or valuable each feature was in the construction of the boosted decision trees within the model.
We could use this method as a strong baseline in Feature Selector, especially when using the GBDT model as a classifier or regressor.
For now, we support the `importance_type` is `split` and `gain`. But we will support customized `importance_type` in the future, which means the user could define how to calculate the `feature score` by themselves.
### Usage
First you need to install dependency:
```
pip install lightgbm
```
Then
```python
from nni.feature_engineering.gbdt_selector import GBDTSelector
# load data
...
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# initlize a selector
fgs = GBDTSelector()
# fit data
fgs.fit(X_train, y_train, ...)
# get improtant features
# will return the index with important feature here.
print(fgs.get_selected_features(10))
...
```
And you could reference the examples in `/examples/feature_engineering/gbdt_selector/`, too.
**Requirement of `fit` FuncArgs**
* **X** (array-like, require) - The training input samples which shape = [n_samples, n_features]
* **y** (array-like, require) - The target values (class labels in classification, real numbers in regression) which shape = [n_samples].
* **lgb_params** (dict, require) - The parameters for lightgbm model. The detail you could reference [here](https://lightgbm.readthedocs.io/en/latest/Parameters.html)
* **eval_ratio** (float, require) - The ratio of data size. It's used for split the eval data and train data from self.X.
* **early_stopping_rounds** (int, require) - The early stopping setting in lightgbm. The detail you could reference [here](https://lightgbm.readthedocs.io/en/latest/Parameters.html).
* **importance_type** (str, require) - could be 'split' or 'gain'. The 'split' means ' result contains numbers of times the feature is used in a model' and the 'gain' means 'result contains total gains of splits which use the feature'. The detail you could reference in [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html#lightgbm.Booster.feature_importance).
* **num_boost_round** (int, require) - number of boost round. The detail you could reference [here](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.train.html#lightgbm.train).
**Requirement of `get_selected_features` FuncArgs**
* **topk** (int, require) - the topK impotance features you want to selected.
## GradientFeatureSelector
The algorithm in GradinetFeatureSelector comes from ["Feature Gradients: Scalable Feature Selection via Discrete Relaxation"](https://arxiv.org/pdf/1908.10382.pdf).
GradientFeatureSelector, a gradient-based search algorithm
for feature selection.
1) This approach extends a recent result on the estimation of
learnability in the sublinear data regime by showing that the calculation can be performed iteratively (i.e., in mini-batches) and in **linear time and space** with respect to both the number of features D and the sample size N.
2) This, along with a discrete-to-continuous relaxation of the search domain, allows for an **efficient, gradient-based** search algorithm among feature subsets for very **large datasets**.
3) Crucially, this algorithm is capable of finding **higher-order correlations** between features and targets for both the N > D and N < D regimes, as opposed to approaches that do not consider such interactions and/or only consider one regime.
### Usage
```python
from nni.feature_engineering.gradient_selector import FeatureGradientSelector
# load data
...
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# initlize a selector
fgs = FeatureGradientSelector(n_features=10)
# fit data
fgs.fit(X_train, y_train)
# get improtant features
# will return the index with important feature here.
print(fgs.get_selected_features())
...
```
And you could reference the examples in `/examples/feature_engineering/gradient_feature_selector/`, too.
**Parameters of class FeatureGradientSelector constructor**
* **order** (int, optional, default = 4) - What order of interactions to include. Higher orders may be more accurate but increase the run time. 12 is the maximum allowed order.
* **penatly** (int, optional, default = 1) - Constant that multiplies the regularization term.
* **n_features** (int, optional, default = None) - If None, will automatically choose number of features based on search. Otherwise, the number of top features to select.
* **max_features** (int, optional, default = None) - If not None, will use the 'elbow method' to determine the number of features with max_features as the upper limit.
* **learning_rate** (float, optional, default = 1e-1) - learning rate
* **init** (*zero, on, off, onhigh, offhigh, or sklearn, optional, default = zero*) - How to initialize the vector of scores. 'zero' is the default.
* **n_epochs** (int, optional, default = 1) - number of epochs to run
* **shuffle** (bool, optional, default = True) - Shuffle "rows" prior to an epoch.
* **batch_size** (int, optional, default = 1000) - Nnumber of "rows" to process at a time.
* **target_batch_size** (int, optional, default = 1000) - Number of "rows" to accumulate gradients over. Useful when many rows will not fit into memory but are needed for accurate estimation.
* **classification** (bool, optional, default = True) - If True, problem is classification, else regression.
* **ordinal** (bool, optional, default = True) - If True, problem is ordinal classification. Requires classification to be True.
* **balanced** (bool, optional, default = True) - If true, each class is weighted equally in optimization, otherwise weighted is done via support of each class. Requires classification to be True.
* **prerocess** (str, optional, default = 'zscore') - 'zscore' which refers to centering and normalizing data to unit variance or 'center' which only centers the data to 0 mean.
* **soft_grouping** (bool, optional, default = True) - If True, groups represent features that come from the same source. Used to encourage sparsity of groups and features within groups.
* **verbose** (int, optional, default = 0) - Controls the verbosity when fitting. Set to 0 for no printing 1 or higher for printing every verbose number of gradient steps.
* **device** (str, optional, default = 'cpu') - 'cpu' to run on CPU and 'cuda' to run on GPU. Runs much faster on GPU
**Requirement of `fit` FuncArgs**
* **X** (array-like, require) - The training input samples which shape = [n_samples, n_features]
* **y** (array-like, require) - The target values (class labels in classification, real numbers in regression) which shape = [n_samples].
* **groups** (array-like, optional, default = None) - Groups of columns that must be selected as a unit. e.g. [0, 0, 1, 2] specifies the first two columns are part of a group. Which shape is [n_features].
**Requirement of `get_selected_features` FuncArgs**
For now, the `get_selected_features` function has no parameters.
# FeatureEngineering
We are glad to announce the alpha release for Feature Engineering toolkit on top of NNI, it's still in the experiment phase which might evolve based on usage feedback. We'd like to invite you to use, feedback and even contribute.
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
import bz2
import urllib.request
import numpy as np
from sklearn.datasets import load_svmlight_file
from sklearn.model_selection import train_test_split
from nni.feature_engineering.gbdt_selector import GBDTSelector
url_zip_train = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/rcv1_train.binary.bz2'
urllib.request.urlretrieve(url_zip_train, filename='train.bz2')
f_svm = open('train.svm', 'wt')
with bz2.open('train.bz2', 'rb') as f_zip:
data = f_zip.read()
f_svm.write(data.decode('utf-8'))
f_svm.close()
X, y = load_svmlight_file('train.svm')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
lgb_params = {
'boosting_type': 'gbdt',
'objective': 'regression',
'metric': {'l2', 'l1'},
'num_leaves': 20,
'learning_rate': 0.05,
'feature_fraction': 0.9,
'bagging_fraction': 0.8,
'bagging_freq': 5,
'verbose': 0}
eval_ratio = 0.1
early_stopping_rounds = 10
importance_type = 'gain'
num_boost_round = 1000
topk = 10
selector = GBDTSelector()
selector.fit(X_train, y_train,
lgb_params = lgb_params,
eval_ratio = eval_ratio,
early_stopping_rounds = early_stopping_rounds,
importance_type = importance_type,
num_boost_round = num_boost_round)
print("selected features\t", selector.get_selected_features(topk=topk))
# Copyright (c) Microsoft Corporation
# All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
# documentation files (the "Software"), to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
# to permit persons to whom the Software is furnished to do so, subject to the following conditions:
# The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING
# BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
import bz2
import urllib.request
import numpy as np
from sklearn.datasets import load_svmlight_file
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel
from nni.feature_engineering.gradient_selector import FeatureGradientSelector
url_zip_train = 'https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/rcv1_train.binary.bz2'
urllib.request.urlretrieve(url_zip_train, filename='train.bz2')
f_svm = open('train.svm', 'wt')
with bz2.open('train.bz2', 'rb') as f_zip:
data = f_zip.read()
f_svm.write(data.decode('utf-8'))
f_svm.close()
X, y = load_svmlight_file('train.svm')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
fgs = FeatureGradientSelector(n_features=10)
fgs.fit(X_train, y_train)
print("selected features\t", fgs.get_selected_features())
pipeline = make_pipeline(FeatureGradientSelector(n_epochs=1, n_features=10), LogisticRegression())
pipeline = make_pipeline(SelectFromModel(ExtraTreesClassifier(n_estimators=50)), LogisticRegression())
pipeline.fit(X_train, y_train)
print("Pipeline Score: ", pipeline.score(X_train, y_train))
\ No newline at end of file
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
# OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ==================================================================================================
import logging
_logger = logging.getLogger(__name__)
class FeatureSelector():
def __init__(self, **kwargs):
self.selected_features_ = None
self.X = None
self.y = None
def fit(self, X, y, **kwargs):
"""
Fit the training data to FeatureSelector
Paramters
---------
X : array-like numpy matrix
The training input samples, which shape is [n_samples, n_features].
y: array-like numpy matrix
The target values (class labels in classification, real numbers in
regression). Which shape is [n_samples].
"""
self.X = X
self.y = y
def get_selected_features(self):
"""
Fit the training data to FeatureSelector
Returns
-------
list :
Return the index of imprtant feature.
"""
return self.selected_features_
from .gbdt_selector import GBDTSelector
\ No newline at end of file
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
# OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ==================================================================================================
"""
gbdt_selector.py including:
class GBDTSelector
"""
import random
from sklearn.model_selection import train_test_split
from nni.feature_engineering.feature_selector import FeatureSelector
# pylint: disable=E0401
import lightgbm as lgb
class GBDTSelector(FeatureSelector):
def __init__(self, **kwargs):
self.selected_features_ = None
self.X = None
self.y = None
self.feature_importance = None
self.lgb_params = None
self.eval_ratio = None
self.early_stopping_rounds = None
self.importance_type = None
self.num_boost_round = None
self.model = None
def fit(self, X, y, **kwargs):
"""
Fit the training data to FeatureSelector
Paramters
---------
X : array-like numpy matrix
The training input samples, which shape is [n_samples, n_features].
y : array-like numpy matrix
The target values (class labels in classification, real numbers in
regression). Which shape is [n_samples].
lgb_params : dict
Parameters of lightgbm
eval_ratio : float
The ratio of data size. It's used for split the eval data and train data from self.X.
early_stopping_rounds : int
The early stopping setting in lightgbm.
importance_type : str
Supporting type is 'gain' or 'split'.
num_boost_round : int
num_boost_round in lightgbm.
"""
assert kwargs['lgb_params']
assert kwargs['eval_ratio']
assert kwargs['early_stopping_rounds']
assert kwargs['importance_type']
assert kwargs['num_boost_round']
self.X = X
self.y = y
self.lgb_params = kwargs['lgb_params']
self.eval_ratio = kwargs['eval_ratio']
self.early_stopping_rounds = kwargs['early_stopping_rounds']
self.importance_type = kwargs['importance_type']
self.num_boost_round = kwargs['num_boost_round']
X_train, X_test, y_train, y_test = train_test_split(self.X,
self.y,
test_size=self.eval_ratio,
random_state=random.seed(41))
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train)
self.model = lgb.train(self.lgb_params,
lgb_train,
num_boost_round=self.num_boost_round,
valid_sets=lgb_eval,
early_stopping_rounds=self.early_stopping_rounds)
self.feature_importance = self.model.feature_importance(self.importance_type)
def get_selected_features(self, topk):
"""
Fit the training data to FeatureSelector
Returns
-------
list :
Return the index of imprtant feature.
"""
assert topk > 0
self.selected_features_ = self.feature_importance.argsort()[-topk:][::-1]
return self.selected_features_
from .gradient_selector import FeatureGradientSelector
\ No newline at end of file
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
# OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ==================================================================================================
import numpy as np
class StorageLevel:
DISK = 'disk'
SPARSE = 'sparse'
DENSE = 'dense'
class DataFormat:
SVM = 'svm'
NUMPY = 'numpy'
ALL_FORMATS = [SVM, NUMPY]
class Preprocess:
"""
center the data to mean 0 and create unit variance
center the data to mean 0
"""
ZSCORE = 'zscore'
CENTER = 'center'
class Device:
CUDA = 'cuda'
CPU = 'cpu'
class Checkpoint:
MODEL = 'model_state_dict'
OPT = 'optimizer_state_dict'
RNG = 'torch_rng_state'
class NanError(ValueError):
pass
class Initialization:
ZERO = 'zero'
ON = 'on'
OFF = 'off'
ON_HIGH = 'onhigh'
OFF_HIGH = 'offhigh'
SKLEARN = 'sklearn'
RANDOM = 'random'
VALUE_DICT = {ZERO: 0,
ON: 1,
OFF: -1,
ON_HIGH: 5,
OFF_HIGH: -1,
SKLEARN: None,
RANDOM: None}
class Coefficients:
""""
coefficients for sublinear estimator were computed running the sublinear
paper's authors' code
"""
SLE = {1: np.array([0.60355337]),
2: np.array([1.52705001, -0.34841729]),
3: np.array([2.90254224, -1.87216745, 0.]),
4: np.array([4.63445685, -5.19936195, 0., 1.50391676]),
5: np.array([6.92948049, -14.12216211, 9.4475009, 0., -1.21093546]),
6: np.array([9.54431082, -28.09414643, 31.84703652, -11.18763791, -1.14175281, 0.]),
7: np.array([12.54505041, -49.64891525, 79.78828031, -46.72250909, 0., 0., 5.02973646]),
8: np.array([16.03550163, -84.286182, 196.86078756, -215.36747071, 92.63961263, 0., 0., -4.86280869]),
9: np.array([19.86409184, -130.76801006, 390.95349861, -570.09210416, 354.77764899, 0., -73.84234865, 0., 10.09148767]),
10: np.array([2.41117752e+01, -1.94946061e+02, 7.34214614e+02, -1.42851995e+03, 1.41567410e+03, \
-5.81738134e+02, 0., 0., 3.11664751e+01, 1.05018365e+00]),
11: np.array([28.75280839, -279.22576729, 1280.46325445, -3104.47148101, 3990.6092248, -2300.29413333, \
0., 427.35289033, 0., 0., -42.17587475]),
12: np.array([33.85141912, -391.4229382, 2184.97827882, -6716.28280208, 11879.75233977, -11739.97267239, \
5384.94542245, 0., -674.23291712, 0., 0., 39.37456439])}
EPSILON = 1e-8
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
# OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ==================================================================================================
import time
import numpy as np
import torch
from sklearn.feature_selection import SelectKBest, \
f_classif, mutual_info_classif, f_regression, mutual_info_regression
import nni.feature_engineering.gradient_selector.constants as constants
import nni.feature_engineering.gradient_selector.syssettings as syssettings
from nni.feature_engineering.gradient_selector.learnability import Solver
from nni.feature_engineering.gradient_selector.utils import EMA
torch.set_default_tensor_type(syssettings.torch.tensortype)
def get_optim_f_stop(maxiter, maxtime, dftol_stop, freltol_stop,
minibatch=True):
"""
Check stopping conditions.
"""
discount_factor = 1. / 3
total_t = [0.]
df_store = [np.nan]
it_store = [0]
relchange_store = [np.nan]
f_ma = EMA(discount_factor=discount_factor)
df_ma = EMA(discount_factor=discount_factor)
def f_stop(f0, v0, it, t):
flag_stop = False
total_t[-1] += t
g = f0.x.grad.clone().cpu().detach()
df = g.abs().max().numpy().squeeze()
v = v0.clone().cpu().detach()
f = v.numpy().squeeze()
if it >= maxiter:
flag_stop = True
elif total_t[-1] >= maxtime:
flag_stop = True
f_ma.update(f)
df_ma.update(df)
rel_change = f_ma.relchange()
if ((not minibatch) and (df < dftol_stop)) \
or (minibatch and (df_ma() < dftol_stop)):
flag_stop = True
if rel_change < freltol_stop:
flag_stop = True
if not minibatch:
df_store[-1] = df
else:
df_store[-1] = df_ma()
relchange_store[-1] = rel_change
it_store[-1] = it
return flag_stop
return f_stop, {'t': total_t, 'it': it_store, 'df': df_store,
'relchange': relchange_store}
def get_init(data_train, init_type='on', rng=np.random.RandomState(0), prev_score=None):
"""
Initialize the 'x' variable with different settings
"""
D = data_train.n_features
value_off = constants.Initialization.VALUE_DICT[
constants.Initialization.OFF]
value_on = constants.Initialization.VALUE_DICT[
constants.Initialization.ON]
if prev_score is not None:
x0 = prev_score
elif not isinstance(init_type, str):
x0 = value_off * np.ones(D)
x0[init_type] = value_on
elif init_type.startswith(constants.Initialization.RANDOM):
d = int(init_type.replace(constants.Initialization.RANDOM, ''))
x0 = value_off * np.ones(D)
x0[rng.permutation(D)[:d]] = value_on
elif init_type == constants.Initialization.SKLEARN:
B = data_train.return_raw
X, y = data_train.get_dense_data()
data_train.set_return_raw(B)
ix = train_sk_dense(init_type, X, y, data_train.classification)
x0 = value_off * np.ones(D)
x0[ix] = value_on
elif init_type in constants.Initialization.VALUE_DICT:
x0 = constants.Initialization.VALUE_DICT[init_type] * np.ones(D)
else:
raise NotImplementedError(
'init_type {0} not supported yet'.format(init_type))
# pylint: disable=E1102
return torch.tensor(x0.reshape((-1, 1)),
dtype=torch.get_default_dtype())
def get_checkpoint(S, stop_conds, rng=None, get_state=True):
"""
Save the necessary information into a dictionary
"""
m = {}
m['ninitfeats'] = S.ninitfeats
m['x0'] = S.x0
x = S.x.clone().cpu().detach()
m['feats'] = np.where(x.numpy() >= 0)[0]
m.update({k: v[0] for k, v in stop_conds.items()})
if get_state:
m.update({constants.Checkpoint.MODEL: S.state_dict(),
constants.Checkpoint.OPT: S.opt_train.state_dict(),
constants.Checkpoint.RNG: torch.get_rng_state(),
})
if rng:
m.update({'rng_state': rng.get_state()})
return m
def _train(data_train, Nminibatch, order, C, rng, lr_train, debug, maxiter,
maxtime, init, dftol_stop, freltol_stop, dn_log, accum_steps,
path_save, shuffle, device=constants.Device.CPU,
verbose=1,
prev_checkpoint=None,
groups=None,
soft_groups=None):
"""
Main training loop.
"""
t_init = time.time()
x0 = get_init(data_train, init, rng)
if isinstance(init, str) and init == constants.Initialization.ZERO:
ninitfeats = -1
else:
ninitfeats = np.where(x0.detach().numpy() > 0)[0].size
S = Solver(data_train, order,
Nminibatch=Nminibatch, x0=x0, C=C,
ftransform=lambda x: torch.sigmoid(2 * x),
get_train_opt=lambda p: torch.optim.Adam(p, lr_train),
rng=rng,
accum_steps=accum_steps,
shuffle=shuffle,
groups=groups,
soft_groups=soft_groups,
device=device,
verbose=verbose)
S = S.to(device)
S.ninitfeats = ninitfeats
S.x0 = x0
if prev_checkpoint:
S.load_state_dict(prev_checkpoint[constants.Checkpoint.MODEL])
S.opt_train.load_state_dict(prev_checkpoint[constants.Checkpoint.OPT])
torch.set_rng_state(prev_checkpoint[constants.Checkpoint.RNG])
minibatch = S.Ntrain != S.Nminibatch
f_stop, stop_conds = get_optim_f_stop(maxiter, maxtime, dftol_stop,
freltol_stop, minibatch=minibatch)
if debug:
pass
else:
f_callback = None
stop_conds['t'][-1] = time.time() - t_init
S.train(f_stop=f_stop, f_callback=f_callback)
return get_checkpoint(S, stop_conds, rng), S
def train_sk_dense(ty, X, y, classification):
if classification:
if ty.startswith('skf'):
d = int(ty.replace('skf', ''))
f_sk = f_classif
elif ty.startswith('skmi'):
d = int(ty.replace('skmi', ''))
f_sk = mutual_info_classif
else:
if ty.startswith('skf'):
d = int(ty.replace('skf', ''))
f_sk = f_regression
elif ty.startswith('skmi'):
d = int(ty.replace('skmi', ''))
f_sk = mutual_info_regression
t = time.time()
clf = SelectKBest(f_sk, k=d)
clf.fit_transform(X, y.squeeze())
ix = np.argsort(-clf.scores_)
ix = ix[np.where(np.invert(np.isnan(clf.scores_[ix])))[0]][:d]
t = time.time() - t
return {'feats': ix, 't': t}
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
# OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ==================================================================================================
import time
import numpy as np
import scipy.special
import torch
import torch.nn as nn
import nni.feature_engineering.gradient_selector.constants as constants
import nni.feature_engineering.gradient_selector.syssettings as syssettings
from nni.feature_engineering.gradient_selector.fginitialize import ChunkDataLoader
torch.set_default_tensor_type(syssettings.torch.tensortype)
sparsetensor = syssettings.torch.sparse.tensortype
def def_train_opt(p):
"""
Return the default optimizer.
"""
return torch.optim.Adam(p, 1e-1, amsgrad=False)
def revcumsum(U):
"""
Reverse cumulative sum for faster performance.
"""
return U.flip(dims=[0]).cumsum(dim=0).flip(dims=[0])
def triudr(X, r):
Zr = torch.zeros_like(X, requires_grad=False)
U = X * r
Zr[:-1] = X[:-1] * revcumsum(U)[1:]
return Zr
def triudl(X, l):
Zl = torch.zeros_like(X, requires_grad=False)
U = X * l
Zl[1:] = X[1:] * (U.cumsum(dim=0)[:-1])
return Zl
class ramp(torch.autograd.Function):
"""
Ensures input is between 0 and 1
"""
@staticmethod
def forward(ctx, input_data):
ctx.save_for_backward(input_data)
return input_data.clamp(min=0, max=1)
@staticmethod
def backward(ctx, grad_output):
input_data, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input_data < 0] = 1e-2
grad_input[input_data > 1] = -1e-2
return grad_input
class safesqrt(torch.autograd.Function):
"""
Square root without dividing by 0.
"""
@staticmethod
def forward(ctx, input_data):
o = input_data.sqrt()
ctx.save_for_backward(input_data, o)
return o
@staticmethod
def backward(ctx, grad_output):
_, o = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input *= 0.5 / (o + constants.EPSILON)
return grad_input
class LearnabilityMB(nn.Module):
"""
Calculates the learnability of a set of features.
mini-batch version w/ "left" and "right" multiplies
"""
def __init__(self, Nminibatch, D, coeff, groups=None, binary=False,
device=constants.Device.CPU):
super(LearnabilityMB, self).__init__()
a = coeff / scipy.special.binom(Nminibatch, np.arange(coeff.size) + 2)
self.order = a.size
# pylint: disable=E1102
self.a = torch.tensor(a, dtype=torch.get_default_dtype(), requires_grad=False)
self.binary = binary
self.a = self.a.to(device)
def ret_val(self, z):
"""
Get the return value based on z.
"""
if not self.binary:
return 1 - z
else:
return 0.5 * (1 - safesqrt.apply(ramp.apply(z)))
def forward(self, s, X, y):
l = y.clone()
r = y.clone()
z = 0
for i in range(self.order):
if i % 2 == 0:
Z = triudr(X, r)
r = torch.mm(Z, s)
else:
Z = triudl(X, l)
l = torch.mm(Z, s)
if self.a[i] != 0:
# same the computation if a[i] is 0
p = torch.mm(l.t(), r)
z += self.a[i] * p
return self.ret_val(z)
class Solver(nn.Module):
"""
Class that performs the main optimization.
Keeps track of the current x and iterates through data to learn x given the penalty and order.
"""
def __init__(self,
PreparedData,
order,
Nminibatch=None,
groups=None,
soft_groups=None,
x0=None,
C=1,
ftransform=torch.sigmoid,
get_train_opt=def_train_opt,
accum_steps=1,
rng=np.random.RandomState(0),
max_norm_clip=1.,
shuffle=True,
device=constants.Device.CPU,
verbose=1):
"""
Parameters
----------
PreparedData : Dataset of PrepareData class
order : int
What order of interactions to include. Higher orders
may be more accurate but increase the run time. 12 is the maximum allowed order.
Nminibatch : int
Number of rows in a mini batch
groups : array-like
Optional, shape = [n_features]
Groups of columns that must be selected as a unit
e.g. [0, 0, 1, 2] specifies the first two columns are part of a group.
soft_groups : array-like
optional, shape = [n_features]
Groups of columns come from the same source
Used to encourage sparsity of number of sources selected
e.g. [0, 0, 1, 2] specifies the first two columns are part of a group.
x0 : torch.tensor
Optional, initialization of x.
C : float
Penalty parameter.
get_train_opt : function
Function that returns a pytorch optimizer, Adam is the default
accum_steps : int
Number of steps
rng : random state
max_norm_clip : float
Maximum allowable size of the gradient
shuffle : bool
Whether or not to shuffle data within the dataloader
order : int
What order of interactions to include. Higher orders
may be more accurate but increase the run time. 12 is the maximum allowed order.
penalty : int
Constant that multiplies the regularization term.
ftransform : function
Function to transform the x. sigmoid is the default.
device : str
'cpu' to run on CPU and 'cuda' to run on GPU. Runs much faster on GPU
verbose : int
Controls the verbosity when fitting. Set to 0 for no printing
1 or higher for printing every verbose number of gradient steps.
"""
super(Solver, self).__init__()
self.Ntrain, self.D = PreparedData.N, PreparedData.n_features
if groups is not None:
# pylint: disable=E1102
groups = torch.tensor(groups, dtype=torch.long)
self.groups = groups
else:
self.groups = None
if soft_groups is not None:
# pylint: disable=E1102
soft_groups = torch.tensor(soft_groups, dtype=torch.long)
self.soft_D = torch.unique(soft_groups).size()[0]
else:
self.soft_D = None
self.soft_groups = soft_groups
if Nminibatch is None:
Nminibatch = self.Ntrain
else:
if Nminibatch > self.Ntrain:
print('Minibatch larger than sample size.'
+ (' Reducing from %d to %d.'
% (Nminibatch, self.Ntrain)))
Nminibatch = self.Ntrain
if Nminibatch > PreparedData.max_rows:
print('Minibatch larger than mem-allowed.'
+ (' Reducing from %d to %d.' % (Nminibatch,
PreparedData.max_rows)))
Nminibatch = int(np.min([Nminibatch, PreparedData.max_rows]))
self.Nminibatch = Nminibatch
self.accum_steps = accum_steps
if x0 is None:
x0 = torch.zeros(self.D, 1, dtype=torch.get_default_dtype())
self.ftransform = ftransform
self.x = nn.Parameter(x0)
self.max_norm = max_norm_clip
self.device = device
self.verbose = verbose
self.multiclass = PreparedData.classification and PreparedData.n_classes and PreparedData.n_classes > 2
if self.multiclass:
self.n_classes = PreparedData.n_classes
else:
self.n_classes = None
# whether to treat all classes equally
self.balanced = PreparedData.balanced
self.ordinal = PreparedData.ordinal
if (hasattr(PreparedData, 'mappings')
or PreparedData.storage_level == 'disk'):
num_workers = PreparedData.num_workers
elif PreparedData.storage_level == constants.StorageLevel.DENSE:
num_workers = 0
else:
num_workers = 0
if constants.Device.CUDA in device:
pin_memory = False
else:
pin_memory = False
self.ds_train = ChunkDataLoader(
PreparedData,
batch_size=self.Nminibatch,
shuffle=shuffle,
drop_last=True,
num_workers=num_workers,
pin_memory=pin_memory,
timeout=60)
self.f_train = LearnabilityMB(self.Nminibatch, self.D,
constants.Coefficients.SLE[order],
self.groups,
binary=PreparedData.classification,
device=self.device)
self.opt_train = get_train_opt(torch.nn.ParameterList([self.x]))
self.it = 0
self.iters_per_epoch = int(np.ceil(len(self.ds_train.dataset)
/ self.ds_train.batch_size))
self.f_train = self.f_train.to(device)
# pylint: disable=E1102
self.w = torch.tensor(
C / (C + 1),
dtype=torch.get_default_dtype(), requires_grad=False)
self.w = self.w.to(device)
def penalty(self, s):
"""
Calculate L1 Penalty.
"""
to_return = torch.sum(s) / self.D
if self.soft_groups is not None:
# if soft_groups, there is an additional penalty for using more
# groups
s_grouped = torch.zeros(self.soft_D, 1,
dtype=torch.get_default_dtype(),
device=self.device)
for group in torch.unique(self.soft_groups):
# groups should be indexed 0 to n_group - 1
# TODO: consider other functions here
s_grouped[group] = s[self.soft_groups == group].max()
# each component of the penalty contributes .5
# TODO: could make this a user given parameter
to_return = (to_return + torch.sum(s_grouped) / self.soft_D) * .5
return to_return
def forward_and_backward(self, s, xsub, ysub, retain_graph=False):
"""
Completes the forward operation and computes gradients for learnability and penalty.
"""
f_train = self.f_train(s, xsub, ysub)
pen = self.penalty(s)
# pylint: disable=E1102
grad_outputs = torch.tensor([[1]], dtype=torch.get_default_dtype(),
device=self.device)
g1, = torch.autograd.grad([f_train], [self.x], grad_outputs,
retain_graph=True)
# pylint: disable=E1102
grad_outputs = torch.tensor([[1]], dtype=torch.get_default_dtype(),
device=self.device)
g2, = torch.autograd.grad([pen], [self.x], grad_outputs,
retain_graph=retain_graph)
return f_train, pen, g1, g2
def combine_gradient(self, g1, g2):
"""
Combine gradients from learnability and penalty
Parameters
----------
g1 : array-like
gradient from learnability
g2 : array-like
gradient from penalty
"""
to_return = ((1 - self.w) * g1 + self.w * g2) / self.accum_steps
if self.groups is not None:
# each column will get a gradient
# but we can only up or down groups, so the gradient for the group
# should be the average of the gradients of the columns
to_return_grouped = torch.zeros_like(self.x)
for group in torch.unique(self.groups):
to_return_grouped[self.groups ==
group] = to_return[self.groups == group].mean()
to_return = to_return_grouped
return to_return
def combine_loss(self, f_train, pen):
"""
Combine the learnability and L1 penalty.
"""
return ((1 - self.w) * f_train.detach() + self.w * pen.detach()) \
/ self.accum_steps
def transform_y_into_binary(self, ysub, target_class):
"""
Transforms multiclass classification problems into a binary classification problem.
"""
with torch.no_grad():
ysub_binary = torch.zeros_like(ysub)
if self.ordinal:
# turn ordinal problems into n-1 classifications of is this
# example less than rank k
if target_class == 0:
return None
ysub_binary[ysub >= target_class] = 1
ysub_binary[ysub < target_class] = -1
else:
# turn multiclass problems into n binary classifications
ysub_binary[ysub == target_class] = 1
ysub_binary[ysub != target_class] = -1
return ysub_binary
def _get_scaling_value(self, ysub, target_class):
"""
Returns the weight given to a class for multiclass classification.
"""
if self.balanced:
if self.ordinal:
return 1 / (torch.unique(ysub).size()[0] - 1)
return 1 / torch.unique(ysub).size()[0]
else:
if self.ordinal:
this_class_proportion = torch.mean(ysub >= target_class)
normalizing_constant = 0
for i in range(1, self.n_classes):
normalizing_constant += torch.mean(ysub >= i)
return this_class_proportion / normalizing_constant
else:
return torch.mean(ysub == target_class)
def _skip_y_forward(self, y):
"""
Returns boolean of whether to skip the currrent y if there is nothing to be learned from it.
"""
if y is None:
return True
elif torch.unique(y).size()[0] < 2:
return True
else:
return False
def train(self, f_callback=None, f_stop=None):
"""
Trains the estimator to determine which features to include.
Parameters
----------
f_callback : function
Function that performs a callback
f_stop: function
Function that tells you when to stop
"""
t = time.time()
h = torch.zeros([1, 1], dtype=torch.get_default_dtype())
h = h.to(self.device)
# h_complete is so when we divide by the number of classes
# we only do that for that minibatch if accumulating
h_complete = h.clone()
flag_stop = False
dataloader_iterator = iter(self.ds_train)
self.x.grad = torch.zeros_like(self.x)
while not flag_stop:
try:
xsub, ysub = next(dataloader_iterator)
except StopIteration:
dataloader_iterator = iter(self.ds_train)
xsub, ysub = next(dataloader_iterator)
try:
s = self.ftransform(self.x)
s = s.to(self.device)
if self.multiclass:
# accumulate gradients over each class, classes range from
# 0 to n_classes - 1
#num_classes_batch = torch.unique(ysub).size()[0]
for target_class in range(self.n_classes):
ysub_binary = self.transform_y_into_binary(
ysub, target_class)
if self._skip_y_forward(ysub_binary):
continue
# should should skip if target class is not included
# but that changes what we divide by
scaling_value = self._get_scaling_value(
ysub, target_class)
f_train, pen, g1, g2 = self.forward_and_backward(
s, xsub, ysub_binary, retain_graph=True)
self.x.grad += self.combine_gradient(
g1, g2) * scaling_value
h += self.combine_loss(f_train,
pen) * scaling_value
else:
if not self._skip_y_forward(ysub):
f_train, pen, g1, g2 = self.forward_and_backward(
s, xsub, ysub)
self.x.grad += self.combine_gradient(g1, g2)
h += self.combine_loss(f_train, pen)
else:
continue
h_complete += h
self.it += 1
if torch.isnan(h):
raise constants.NanError(
'Loss is nan, something may be misconfigured')
if self.it % self.accum_steps == 0:
torch.nn.utils.clip_grad_norm_(
torch.nn.ParameterList([self.x]),
max_norm=self.max_norm)
self.opt_train.step()
t = time.time() - t
if f_stop is not None:
flag_stop = f_stop(self, h, self.it, t)
if f_callback is not None:
f_callback(self, h, self.it, t)
elif self.verbose and (self.it // self.accum_steps) % self.verbose == 0:
epoch = int(self.it / self.iters_per_epoch)
print(
'[Minibatch: %6d/ Epoch: %3d/ t: %3.3f s] Loss: %0.3f' %
(self.it, epoch, t, h_complete / self.accum_steps))
if flag_stop:
break
self.opt_train.zero_grad()
h = 0
h_complete = 0
t = time.time()
except KeyboardInterrupt:
flag_stop = True
break
numpy==1.14.3
scikit-learn==0.20.0
scipy==1.1.0
torch==1.1.0
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
# OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ==================================================================================================
import torch
# pytorch
torch.tensortype = torch.FloatTensor
torch.sparse.tensortype = torch.sparse.FloatTensor
# mem
MAXMEMGB = 10
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# MIT License
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
# associated documentation files (the "Software"), to deal in the Software without restriction,
# including without limitation the rights to use, copy, modify, merge, publish, distribute,
# sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all copies or
# substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
# NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
# DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
# OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# ==================================================================================================
import numpy as np
class EMA():
"""
maintains an exponential moving average
"""
def __init__(self, f=np.nan, discount_factor=0.1, valid_after=None,
n_iters_relchange=3):
self.f_ma = [f]
self.fs = [f]
self.gamma = discount_factor
self.rel_change = [np.nan]
if valid_after is None:
self.valid_after = int(1/discount_factor)
else:
self.valid_after = valid_after
self.n_iters_relchange = n_iters_relchange
self.initialized = False
def reset(self, f):
self.f_ma = [f]
self.fs = [f]
self.rel_change = [np.nan]
self.initialized = True
def relchange(self):
if self.num_updates() > np.max([self.valid_after,
self.n_iters_relchange]):
return np.max(self.rel_change[-self.n_iters_relchange:])
else:
return np.nan
def update(self, f_new):
if not self.initialized:
self.reset(f_new)
else:
self.fs.append(f_new)
self.f_ma.append(self.f_ma[-1]*(1-self.gamma) + self.gamma*f_new)
if self.num_updates() > self.valid_after:
self.rel_change.append(np.abs((self.f_ma[-1]-self.f_ma[-2])
/ self.f_ma[-2]))
def num_updates(self):
return len(self.f_ma)
def __call__(self):
if self.num_updates() > self.valid_after:
return self.f_ma[-1]
else:
return np.nan
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment