Commit db18367e authored by wxchan's avatar wxchan Committed by Guolin Ke
Browse files

rewrite test with unittest (#142)

* add troubleshooting

* use unittest

* update unittest version

* fix test_engine.py

* fix test_sklearn.py

* default eval_metric by subclass

* add test grid search

* remove verbose_eval
parent fd28f095
...@@ -14,7 +14,7 @@ before_install: ...@@ -14,7 +14,7 @@ before_install:
install: install:
- sudo apt-get install -y libopenmpi-dev openmpi-bin build-essential - sudo apt-get install -y libopenmpi-dev openmpi-bin build-essential
- conda install --yes atlas numpy scipy scikit-learn pandas - conda install --yes atlas numpy scipy scikit-learn
script: script:
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
* [Training API](Python-API.md#training-api) * [Training API](Python-API.md#training-api)
- [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone) - [train](Python-API.md#trainparams-train_set-num_boost_round100-valid_setsnone-valid_namesnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-evals_resultnone-verbose_evaltrue-learning_ratesnone-callbacksnone)
- [cv](Python-API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone) - [cv](Python-API.md#cvparams-train_set-num_boost_round10-nfold5-stratifiedfalse-shuffletrue-metricsnone-fobjnone-fevalnone-init_modelnone-feature_namenone-categorical_featurenone-early_stopping_roundsnone-fpreprocnone-verbose_evalnone-show_stdvtrue-seed0-callbacksnone)
* [Scikit-learn API](Python-API.md#scikit-learn-api) * [Scikit-learn API](Python-API.md#scikit-learn-api)
- [Common Methods](Python-API.md#common-methods) - [Common Methods](Python-API.md#common-methods)
...@@ -516,7 +516,7 @@ The methods of each Class is in alphabetical order. ...@@ -516,7 +516,7 @@ The methods of each Class is in alphabetical order.
booster : a trained booster model booster : a trained booster model
####cv(params, train_set, num_boost_round=10, nfold=5, stratified=False, metrics=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None) ####cv(params, train_set, num_boost_round=10, nfold=5, stratified=False, shuffle=True, metrics=None, fobj=None, feval=None, init_model=None, feature_name=None, categorical_feature=None, early_stopping_rounds=None, fpreproc=None, verbose_eval=None, show_stdv=True, seed=0, callbacks=None)
Cross-validation with given paramaters. Cross-validation with given paramaters.
...@@ -532,6 +532,8 @@ The methods of each Class is in alphabetical order. ...@@ -532,6 +532,8 @@ The methods of each Class is in alphabetical order.
Number of folds in CV. Number of folds in CV.
stratified : bool stratified : bool
Perform stratified sampling. Perform stratified sampling.
shuffle: bool
Whether shuffle before split data.
folds : a KFold or StratifiedKFold instance folds : a KFold or StratifiedKFold instance
Sklearn KFolds or StratifiedKFolds. Sklearn KFolds or StratifiedKFolds.
metrics : str or list of str metrics : str or list of str
...@@ -723,6 +725,7 @@ The methods of each Class is in alphabetical order. ...@@ -723,6 +725,7 @@ The methods of each Class is in alphabetical order.
eval_metric : str, list of str, callable, optional eval_metric : str, list of str, callable, optional
If a str, should be a built-in evaluation metric to use. If a str, should be a built-in evaluation metric to use.
If callable, a custom evaluation metric, see note for more details. If callable, a custom evaluation metric, see note for more details.
default: binary_error for LGBMClassifier, l2 for LGBMRegressor, ndcg for LGBMRanker
early_stopping_rounds : int early_stopping_rounds : int
verbose : bool verbose : bool
If `verbose` and an evaluation set is used, writes the evaluation If `verbose` and an evaluation set is used, writes the evaluation
...@@ -806,11 +809,11 @@ The methods of each Class is in alphabetical order. ...@@ -806,11 +809,11 @@ The methods of each Class is in alphabetical order.
###LGBMRanker ###LGBMRanker
####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric=None, eval_at=None, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, other_params=None) ####fit(X, y, sample_weight=None, init_score=None, group=None, eval_set=None, eval_sample_weight=None, eval_init_score=None, eval_group=None, eval_metric='ndcg', eval_at=1, early_stopping_rounds=None, verbose=True, feature_name=None, categorical_feature=None, other_params=None)
Most arguments are same as Common Methods except: Most arguments are same as Common Methods except:
eval_at : list of int eval_at : int or list of int, default=1
The evaulation positions of NDCG The evaulation positions of NDCG
## Callbacks ## Callbacks
......
...@@ -11,9 +11,23 @@ Installation ...@@ -11,9 +11,23 @@ Installation
Note: Make sure you have `setuptools <https://pypi.python.org/pypi/setuptools>`__ Note: Make sure you have `setuptools <https://pypi.python.org/pypi/setuptools>`__
Examples Examples
-------- --------
- Refer also to the walk through examples in `python-guide - Refer also to the walk through examples in `python-guide
folder <https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide>`__ folder <https://github.com/Microsoft/LightGBM/tree/master/examples/python-guide>`__
Troubleshooting
--------
- **Trouble 1**: I see error messages like this when install from github using `python setup.py install`.
error: Error: setup script specifies an absolute path:
/Users/Microsoft/LightGBM/python-package/lightgbm/../../lib_lightgbm.so
setup() arguments must *always* be /-separated paths relative to the
setup.py directory, *never* absolute paths.
- **Solution 1**: please check `here <http://stackoverflow.com/questions/18085571/pip-install-error-setup-script-specifies-an-absolute-path>`__.
...@@ -527,7 +527,7 @@ class _InnerDataset(object): ...@@ -527,7 +527,7 @@ class _InnerDataset(object):
is_reshape=False) is_reshape=False)
if self.predictor.num_class > 1: if self.predictor.num_class > 1:
# need re group init score # need re group init score
new_init_score = np.zeros(init_score.size(), dtype=np.float32) new_init_score = np.zeros(init_score.size, dtype=np.float32)
num_data = self.num_data() num_data = self.num_data()
for i in range(num_data): for i in range(num_data):
for j in range(self.predictor.num_class): for j in range(self.predictor.num_class):
...@@ -981,8 +981,7 @@ class Dataset(object): ...@@ -981,8 +981,7 @@ class Dataset(object):
self._predictor = predictor self._predictor = predictor
self.inner_dataset = None self.inner_dataset = None
else: else:
raise LightGBMError("Cannot set predictor after freed raw data,\ raise LightGBMError("Cannot set predictor after freed raw data,Set free_raw_data=False when construct Dataset to avoid this.")
Set free_raw_data=False when construct Dataset to avoid this.")
def set_reference(self, reference): def set_reference(self, reference):
""" """
......
...@@ -155,7 +155,7 @@ def early_stopping(stopping_rounds, verbose=True): ...@@ -155,7 +155,7 @@ def early_stopping(stopping_rounds, verbose=True):
def init(env): def init(env):
"""internal function""" """internal function"""
if not env.evaluation_result_list: if not env.evaluation_result_list:
raise ValueError('For early stopping, at least one dataset is required for evaluation') raise ValueError('For early stopping, at least one dataset or eval metric is required for evaluation')
if verbose: if verbose:
msg = "Train until valid scores didn't improve in {} rounds." msg = "Train until valid scores didn't improve in {} rounds."
......
...@@ -344,13 +344,6 @@ class LGBMModel(LGBMModelBase): ...@@ -344,13 +344,6 @@ class LGBMModel(LGBMModelBase):
params["objective"] = "None" params["objective"] = "None"
else: else:
params["objective"] = self.objective params["objective"] = self.objective
if eval_metric is None and eval_set is not None:
eval_metric = {
'regression': 'l2',
'binary': 'binary_logloss',
'lambdarank': 'ndcg',
'multiclass': 'multi_logloss'
}.get(self.objective, None)
if callable(eval_metric): if callable(eval_metric):
feval = _eval_function_wrapper(eval_metric) feval = _eval_function_wrapper(eval_metric)
...@@ -471,7 +464,7 @@ class LGBMRegressor(LGBMModel, LGBMRegressorBase): ...@@ -471,7 +464,7 @@ class LGBMRegressor(LGBMModel, LGBMRegressorBase):
sample_weight=None, init_score=None, sample_weight=None, init_score=None,
eval_set=None, eval_sample_weight=None, eval_set=None, eval_sample_weight=None,
eval_init_score=None, eval_init_score=None,
eval_metric=None, eval_metric="l2",
early_stopping_rounds=None, verbose=True, early_stopping_rounds=None, verbose=True,
feature_name=None, categorical_feature=None, feature_name=None, categorical_feature=None,
other_params=None): other_params=None):
...@@ -504,7 +497,7 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase): ...@@ -504,7 +497,7 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase):
sample_weight=None, init_score=None, sample_weight=None, init_score=None,
eval_set=None, eval_sample_weight=None, eval_set=None, eval_sample_weight=None,
eval_init_score=None, eval_init_score=None,
eval_metric=None, eval_metric="binary_logloss",
early_stopping_rounds=None, verbose=True, early_stopping_rounds=None, verbose=True,
feature_name=None, categorical_feature=None, feature_name=None, categorical_feature=None,
other_params=None): other_params=None):
...@@ -517,6 +510,8 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase): ...@@ -517,6 +510,8 @@ class LGBMClassifier(LGBMModel, LGBMClassifierBase):
# Switch to using a multiclass objective in the underlying LGBM instance # Switch to using a multiclass objective in the underlying LGBM instance
self.objective = "multiclass" self.objective = "multiclass"
other_params['num_class'] = self.n_classes_ other_params['num_class'] = self.n_classes_
if eval_set is not None and eval_metric == "binary_logloss":
eval_metric = "multi_logloss"
self._le = LGBMLabelEncoder().fit(y) self._le = LGBMLabelEncoder().fit(y)
training_labels = self._le.transform(y) training_labels = self._le.transform(y)
...@@ -589,7 +584,7 @@ class LGBMRanker(LGBMModel): ...@@ -589,7 +584,7 @@ class LGBMRanker(LGBMModel):
sample_weight=None, init_score=None, group=None, sample_weight=None, init_score=None, group=None,
eval_set=None, eval_sample_weight=None, eval_set=None, eval_sample_weight=None,
eval_init_score=None, eval_group=None, eval_init_score=None, eval_group=None,
eval_metric=None, eval_at=None, eval_metric='ndcg', eval_at=1,
early_stopping_rounds=None, verbose=True, early_stopping_rounds=None, verbose=True,
feature_name=None, categorical_feature=None, feature_name=None, categorical_feature=None,
other_params=None): other_params=None):
...@@ -616,6 +611,8 @@ class LGBMRanker(LGBMModel): ...@@ -616,6 +611,8 @@ class LGBMRanker(LGBMModel):
if eval_at is not None: if eval_at is not None:
other_params = {} if other_params is None else other_params other_params = {} if other_params is None else other_params
if isinstance(eval_at, int):
eval_at = [eval_at]
other_params['ndcg_eval_at'] = list(eval_at) other_params['ndcg_eval_at'] = list(eval_at)
super(LGBMRanker, self).fit(X, y, sample_weight, init_score, group, super(LGBMRanker, self).fit(X, y, sample_weight, init_score, group,
eval_set, eval_sample_weight, eval_init_score, eval_group, eval_set, eval_sample_weight, eval_init_score, eval_group,
......
# coding: utf-8 # coding: utf-8
# pylint: disable = invalid-name, C0111 # pylint: skip-file
import json import os, unittest, math
import numpy as np
import lightgbm as lgb import lightgbm as lgb
import pandas as pd from sklearn.metrics import log_loss, mean_squared_error, mean_absolute_error
from sklearn.metrics import mean_squared_error from sklearn.datasets import load_breast_cancer, load_boston, load_digits, load_iris
from sklearn.model_selection import train_test_split
# load or create your dataset
df_train = pd.read_csv('../../examples/regression/regression.train', header=None, sep='\t') def multi_logloss(y_true, y_pred):
df_test = pd.read_csv('../../examples/regression/regression.test', header=None, sep='\t') return np.mean([-math.log(y_pred[i][y]) for i, y in enumerate(y_true)])
y_train = df_train[0] def test_template(params = {'objective' : 'regression', 'metric' : 'l2'},
y_test = df_test[0] X_y=load_boston(True), feval=mean_squared_error,
X_train = df_train.drop(0, axis=1) stratify=None, num_round=100, return_data=False,
X_test = df_test.drop(0, axis=1) return_model=False, init_model=None, custom_eval=None):
X, y = X_y
# create dataset for lightgbm X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1,
lgb_train = lgb.Dataset(X_train, y_train, free_raw_data=False) stratify=stratify,
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train,free_raw_data=False) random_state=42)
lgb_train = lgb.Dataset(X_train, y_train, free_raw_data=not return_model, params=params)
lgb_eval = lgb.Dataset(X_test, y_test, reference=lgb_train, free_raw_data=not return_model, params=params)
# specify your configurations as a dict if return_data: return lgb_train, lgb_eval
params = { evals_result = {}
'task' : 'train', params['verbose'] = 0
'boosting_type' : 'gbdt', gbm = lgb.train(params, lgb_train,
'objective' : 'regression', num_boost_round=num_round,
'metric' : {'l2', 'auc'}, valid_sets=lgb_eval,
'num_leaves' : 31, valid_names='eval',
'learning_rate' : 0.05, verbose_eval=False,
'feature_fraction' : 0.9, feval=custom_eval,
'bagging_fraction' : 0.8, evals_result=evals_result,
'bagging_freq': 5, early_stopping_rounds=10,
'verbose' : 0 init_model=init_model)
} if return_model: return gbm
else: return evals_result, feval(y_test, gbm.predict(X_test, gbm.best_iteration))
# train
init_gbm = lgb.train(params, class TestBasic(unittest.TestCase):
lgb_train,
num_boost_round=5, def test_binary(self):
valid_sets=lgb_eval) X_y= load_breast_cancer(True)
params = {
print('Start continue train') 'objective' : 'binary',
'metric' : 'binary_logloss'
gbm = lgb.train(params, }
lgb_train, evals_result, ret = test_template(params, X_y, log_loss, stratify=X_y[1])
num_boost_round=100, self.assertLess(ret, 0.15)
valid_sets=lgb_eval, self.assertAlmostEqual(min(evals_result['eval']['logloss']), ret, places=5)
early_stopping_rounds=10,
init_model=init_gbm) def test_regreesion(self):
evals_result, ret = test_template()
ret **= 0.5
# save model to file self.assertLess(ret, 4)
gbm.save_model('model.txt') self.assertAlmostEqual(min(evals_result['eval']['l2']), ret, places=5)
# predict def test_multiclass(self):
y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration) X_y = load_digits(10, True)
# eval params = {
print('The rmse of prediction is:', mean_squared_error(y_test, y_pred) ** 0.5) 'objective' : 'multiclass',
'metric' : 'multi_logloss',
# dump model to json (and save to file) 'num_class' : 10
model_json = gbm.dump_model() }
evals_result, ret = test_template(params, X_y, multi_logloss, stratify=X_y[1])
with open('model.json', 'w+') as f: self.assertLess(ret, 0.2)
json.dump(model_json, f, indent=4) self.assertAlmostEqual(min(evals_result['eval']['multi_logloss']), ret, places=5)
# feature importances def test_continue_train_and_other(self):
print('Feature importances:', gbm.feature_importance()) params = {
print('Feature importances:', gbm.feature_importance("gain")) 'objective' : 'regression',
'metric' : 'l1'
print('Start test cv') }
model_name = 'model.txt'
lgb.cv(params, gbm = test_template(params, num_round=20, return_model=True)
lgb_train, gbm.save_model(model_name)
num_boost_round=100, evals_result, ret = test_template(params, feval=mean_absolute_error,
nfold=5, num_round=80, init_model=model_name,
verbose_eval=5, custom_eval=(lambda p, d: ('mae', mean_absolute_error(p, d.get_label()), False)))
init_model=init_gbm) self.assertLess(ret, 3)
self.assertAlmostEqual(min(evals_result['eval']['l1']), ret, places=5)
for l1, mae in zip(evals_result['eval']['l1'], evals_result['eval']['mae']):
self.assertAlmostEqual(l1, mae, places=5)
self.assertIn('tree_info', gbm.dump_model())
self.assertIsInstance(gbm.feature_importance(), np.ndarray)
os.remove(model_name)
def test_continue_train_multiclass(self):
X_y = load_iris(True)
params = {
'objective' : 'multiclass',
'metric' : 'multi_logloss',
'num_class' : 3
}
gbm = test_template(params, X_y, num_round=20, return_model=True, stratify=X_y[1])
evals_result, ret = test_template(params, X_y, feval=multi_logloss,
num_round=80, init_model=gbm)
self.assertLess(ret, 1.5)
self.assertAlmostEqual(min(evals_result['eval']['multi_logloss']), ret, places=5)
def test_cv(self):
lgb_train, lgb_eval = test_template(return_data=True)
lgb.cv({'verbose':0}, lgb_train, num_boost_round=200, nfold=5,
metrics='l1', verbose_eval=False)
print("----------------------------------------------------------------------")
print("running test_engine.py")
unittest.main()
# coding: utf-8
# pylint: skip-file
import os, unittest
import numpy as np import numpy as np
import random
import lightgbm as lgb import lightgbm as lgb
from sklearn.metrics import log_loss, mean_squared_error, mean_absolute_error
from sklearn.datasets import load_breast_cancer, load_boston, load_digits, load_iris, load_svmlight_file
rng = np.random.RandomState(2016) from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.base import clone
def test_binary_classification():
def test_template(X_y=load_boston(True), model=lgb.LGBMRegressor,
from sklearn import datasets, metrics, model_selection feval=mean_squared_error, stratify=None, num_round=100, return_data=False,
return_model=False, init_model=None, custom_obj=None, proba=False):
X, y = datasets.make_classification(n_samples=10000, n_features=100) X, y = X_y
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1, random_state=1) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1,
lgb_model = lgb.LGBMClassifier().fit(x_train, y_train) stratify=stratify,
from sklearn.datasets import load_digits random_state=42)
digits = load_digits(2) if return_data: return X_train, X_test, y_train, y_test
y = digits['target'] gbm = model(n_estimators=num_round, objective=custom_obj) if custom_obj else model(n_estimators=num_round)
X = digits['data'] gbm.fit(X_train, y_train, eval_set=[(X_test, y_test)], early_stopping_rounds=10, verbose=False)
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1, random_state=1) if return_model: return gbm
lgb_model = lgb.LGBMClassifier().fit(x_train, y_train) else: return feval(y_test, gbm.predict_proba(X_test) if proba else gbm.predict(X_test))
preds = lgb_model.predict(x_test)
err = sum(1 for i in range(len(preds)) class TestSklearn(unittest.TestCase):
if int(preds[i] > 0.5) != y_test[i]) / float(len(preds))
assert err < 0.1 def test_binary(self):
X_y= load_breast_cancer(True)
def test_multiclass_classification(): ret = test_template(X_y, lgb.LGBMClassifier, log_loss, stratify=X_y[1], proba=True)
from sklearn.datasets import load_iris self.assertLess(ret, 0.15)
from sklearn import datasets, metrics, model_selection
def test_regreesion(self):
def check_pred(preds, labels): self.assertLess(test_template() ** 0.5, 4)
err = sum(1 for i in range(len(preds))
if int(preds[i] > 0.5) != labels[i]) / float(len(preds)) def test_multiclass(self):
assert err < 0.7 X_y = load_digits(10, True)
def multi_error(y_true, y_pred):
return np.mean(y_true != y_pred)
X, y = datasets.make_classification(n_samples=10000, n_features=100, n_classes=4, n_informative=3) ret = test_template(X_y, lgb.LGBMClassifier, multi_error, stratify=X_y[1])
self.assertLess(ret, 0.2)
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1, random_state=1)
def test_lambdarank(self):
lgb_model = lgb.LGBMClassifier().fit(x_train, y_train) X_train, y_train = load_svmlight_file('../../examples/lambdarank/rank.train')
preds = lgb_model.predict(x_test) X_test, y_test = load_svmlight_file('../../examples/lambdarank/rank.test')
q_train = np.loadtxt('../../examples/lambdarank/rank.train.query')
check_pred(preds, y_test) lgb_model = lgb.LGBMRanker().fit(X_train, y_train, group=q_train, eval_at=[1])
def test_regression(): def test_regression_with_custom_objective(self):
from sklearn.metrics import mean_squared_error def objective_ls(y_true, y_pred):
from sklearn.datasets import load_boston grad = (y_pred - y_true)
from sklearn.cross_validation import KFold hess = np.ones(len(y_true))
from sklearn import datasets, metrics, model_selection return grad, hess
ret = test_template(custom_obj=objective_ls)
boston = load_boston() self.assertLess(ret, 100)
y = boston['target']
X = boston['data'] def test_binary_classification_with_custom_objective(self):
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1, random_state=1) def logregobj(y_true, y_pred):
lgb_model = lgb.LGBMRegressor().fit(x_train, y_train) y_pred = 1.0 / (1.0 + np.exp(-y_pred))
preds = lgb_model.predict(x_test) grad = y_pred - y_true
assert mean_squared_error(preds, y_test) < 100 hess = y_pred * (1.0 - y_pred)
return grad, hess
def test_lambdarank(): X_y = load_digits(2, True)
from sklearn.datasets import load_svmlight_file def binary_error(y_test, y_pred):
X_train, y_train = load_svmlight_file('../../examples/lambdarank/rank.train') return np.mean([int(p > 0.5) != y for y, p in zip(y_test, y_pred)])
X_test, y_test = load_svmlight_file('../../examples/lambdarank/rank.test') ret = test_template(X_y, lgb.LGBMClassifier, feval=binary_error, custom_obj=logregobj)
q_train = np.loadtxt('../../examples/lambdarank/rank.train.query') self.assertLess(ret, 0.1)
lgb_model = lgb.LGBMRanker().fit(X_train, y_train, group=q_train, eval_at=[1])
def test_grid_search(self):
def test_regression_with_custom_objective(): X_train, X_test, y_train, y_test = test_template(return_data=True)
from sklearn.metrics import mean_squared_error params = {'n_estimators': [10, 15, 20]}
from sklearn.datasets import load_boston gbm = GridSearchCV(lgb.LGBMRegressor(), params, cv=5)
from sklearn.cross_validation import KFold gbm.fit(X_train, y_train)
from sklearn import datasets, metrics, model_selection self.assertIn(gbm.best_params_['n_estimators'], [10, 15, 20])
def objective_ls(y_true, y_pred):
grad = (y_pred - y_true) def test_clone(self):
hess = np.ones(len(y_true)) gbm = test_template(return_model=True)
return grad, hess gbm_clone = clone(gbm)
boston = load_boston()
y = boston['target'] print("----------------------------------------------------------------------")
X = boston['data'] print("running test_sklearn.py")
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1, random_state=1) unittest.main()
lgb_model = lgb.LGBMRegressor(objective=objective_ls).fit(x_train, y_train)
preds = lgb_model.predict(x_test)
assert mean_squared_error(preds, y_test) < 100
def test_binary_classification_with_custom_objective():
from sklearn import datasets, metrics, model_selection
def logregobj(y_true, y_pred):
y_pred = 1.0 / (1.0 + np.exp(-y_pred))
grad = y_pred - y_true
hess = y_pred * (1.0 - y_pred)
return grad, hess
X, y = datasets.make_classification(n_samples=10000, n_features=100)
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1, random_state=1)
lgb_model = lgb.LGBMClassifier(objective=logregobj).fit(x_train, y_train)
from sklearn.datasets import load_digits
digits = load_digits(2)
y = digits['target']
X = digits['data']
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=1)
lgb_model = lgb.LGBMClassifier(objective=logregobj).fit(x_train, y_train)
preds = lgb_model.predict(x_test)
err = sum(1 for i in range(len(preds))
if int(preds[i] > 0.5) != y_test[i]) / float(len(preds))
assert err < 0.1
def test_early_stopping():
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.cross_validation import KFold
from sklearn import datasets, metrics, model_selection
from sklearn.base import clone
boston = load_boston()
y = boston['target']
X = boston['data']
x_train, x_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.1, random_state=1)
lgb_model = lgb.LGBMRegressor(n_estimators=500) \
.fit(x_train, y_train, eval_set=[(x_test, y_test)],
eval_metric='l2',
early_stopping_rounds=10,
verbose=10)
lgb_model_clone = clone(lgb_model)
print(lgb_model.best_iteration)
test_binary_classification()
test_multiclass_classification()
test_regression()
test_lambdarank()
test_regression_with_custom_objective()
test_binary_classification_with_custom_objective()
test_early_stopping()
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment