sklearn.model_selection.cross_validate(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch=‘2*n_jobs’, return_train_score=’warn’)
[source]
Evaluate metric(s) by cross-validation and also record fit/score times.
Read more in the User Guide.
Parameters: |
estimator : estimator object implementing ‘fit’ The object to use to fit the data. X : array-like The data to fit. Can be for example a list, or an array. y : array-like, optional, default: None The target variable to try to predict in the case of supervised learning. groups : array-like, with shape (n_samples,), optional Group labels for the samples used while splitting the dataset into train/test set. scoring : string, callable, list/tuple, dict or None, default: None A single string (see The scoring parameter: defining model evaluation rules) or a callable (see Defining your scoring strategy from metric functions) to evaluate the predictions on the test set. For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values. NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each. See Specifying multiple metrics for evaluation for an example. If None, the estimator’s default scorer (if available) is used. cv : int, cross-validation generator or an iterable, optional Determines the cross-validation splitting strategy. Possible inputs for cv are:
For integer/None inputs, if the estimator is a classifier and Refer User Guide for the various cross-validation strategies that can be used here. n_jobs : integer, optional The number of CPUs to use to do the computation. -1 means ‘all CPUs’. verbose : integer, optional The verbosity level. fit_params : dict, optional Parameters to pass to the fit method of the estimator. pre_dispatch : int, or string, optional Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
return_train_score : boolean, optional Whether to include train scores. Current default is |
---|---|
Returns: |
scores : dict of float arrays of shape=(n_splits,) Array of scores of the estimator for each run of the cross validation. A dict of arrays containing the score/time arrays for each scorer is returned. The possible keys for this
|
See also
sklearn.model_selection.cross_val_score
sklearn.metrics.make_scorer
>>> from sklearn import datasets, linear_model >>> from sklearn.model_selection import cross_validate >>> from sklearn.metrics.scorer import make_scorer >>> from sklearn.metrics import confusion_matrix >>> from sklearn.svm import LinearSVC >>> diabetes = datasets.load_diabetes() >>> X = diabetes.data[:150] >>> y = diabetes.target[:150] >>> lasso = linear_model.Lasso()
Single metric evaluation using cross_validate
>>> cv_results = cross_validate(lasso, X, y, return_train_score=False) >>> sorted(cv_results.keys()) ['fit_time', 'score_time', 'test_score'] >>> cv_results['test_score'] array([ 0.33..., 0.08..., 0.03...])
Multiple metric evaluation using cross_validate
(please refer the scoring
parameter doc for more information)
>>> scores = cross_validate(lasso, X, y, ... scoring=('r2', 'neg_mean_squared_error')) >>> print(scores['test_neg_mean_squared_error']) [-3635.5... -3573.3... -6114.7...] >>> print(scores['train_r2']) [ 0.28... 0.39... 0.22...]
© 2007–2017 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html