class sklearn.ensemble.RandomTreesEmbedding(n_estimators=10, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, sparse_output=True, n_jobs=1, random_state=None, verbose=0, warm_start=False)
[source]
An ensemble of totally random trees.
An unsupervised transformation of a dataset to a high-dimensional sparse representation. A datapoint is coded according to which leaf of each tree it is sorted into. Using a one-hot encoding of the leaves, this leads to a binary coding with as many ones as there are trees in the forest.
The dimensionality of the resulting representation is n_out <= n_estimators * max_leaf_nodes
. If max_leaf_nodes == None
, the number of leaf nodes is at most n_estimators * 2 ** max_depth
.
Read more in the User Guide.
Parameters: |
n_estimators : integer, optional (default=10) Number of trees in the forest. max_depth : integer, optional (default=5) The maximum depth of each tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. min_samples_split : int, float, optional (default=2) The minimum number of samples required to split an internal node:
Changed in version 0.18: Added float values for percentages. min_samples_leaf : int, float, optional (default=1) The minimum number of samples required to be at a leaf node:
Changed in version 0.18: Added float values for percentages. min_weight_fraction_leaf : float, optional (default=0.) The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided. max_leaf_nodes : int or None, optional (default=None) Grow trees with min_impurity_split : float, Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf. Deprecated since version 0.19: min_impurity_decrease : float, optional (default=0.) A node will be split if this split induces a decrease of the impurity greater than or equal to this value. The weighted impurity decrease equation is the following: N_t / N * (impurity - N_t_R / N_t * right_impurity - N_t_L / N_t * left_impurity) where
New in version 0.19. bootstrap : boolean, optional (default=True) Whether bootstrap samples are used when building trees. sparse_output : bool, optional (default=True) Whether or not to return a sparse CSR matrix, as default behavior, or to return a dense array compatible with dense pipeline operators. n_jobs : integer, optional (default=1) The number of jobs to run in parallel for both random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by verbose : int, optional (default=0) Controls the verbosity of the tree building process. warm_start : bool, optional (default=False) When set to |
---|---|
Attributes: |
estimators_ : list of DecisionTreeClassifier The collection of fitted sub-estimators. |
[R168] | P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 3-42, 2006. |
[R169] | Moosmann, F. and Triggs, B. and Jurie, F. “Fast discriminative visual codebooks using randomized clustering forests” NIPS 2007 |
apply (X) | Apply trees in the forest to X, return leaf indices. |
decision_path (X) | Return the decision path in the forest |
fit (X[, y, sample_weight]) | Fit estimator. |
fit_transform (X[, y, sample_weight]) | Fit estimator and transform dataset. |
get_params ([deep]) | Get parameters for this estimator. |
set_params (**params) | Set the parameters of this estimator. |
transform (X) | Transform dataset. |
__init__(n_estimators=10, max_depth=5, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, sparse_output=True, n_jobs=1, random_state=None, verbose=0, warm_start=False)
[source]
apply(X)
[source]
Apply trees in the forest to X, return leaf indices.
Parameters: |
X : array-like or sparse matrix, shape = [n_samples, n_features] The input samples. Internally, its dtype will be converted to |
---|---|
Returns: |
X_leaves : array_like, shape = [n_samples, n_estimators] For each datapoint x in X and for each tree in the forest, return the index of the leaf x ends up in. |
decision_path(X)
[source]
Return the decision path in the forest
New in version 0.18.
Parameters: |
X : array-like or sparse matrix, shape = [n_samples, n_features] The input samples. Internally, its dtype will be converted to |
---|---|
Returns: |
indicator : sparse csr array, shape = [n_samples, n_nodes] Return a node indicator matrix where non zero elements indicates that the samples goes through the nodes. n_nodes_ptr : array of size (n_estimators + 1, ) The columns from indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] gives the indicator value for the i-th estimator. |
feature_importances_
Returns: | feature_importances_ : array, shape = [n_features] |
---|
fit(X, y=None, sample_weight=None)
[source]
Fit estimator.
Parameters: |
X : array-like or sparse matrix, shape=(n_samples, n_features) The input samples. Use sample_weight : array-like, shape = [n_samples] or None Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node. |
---|---|
Returns: |
self : object Returns self. |
fit_transform(X, y=None, sample_weight=None)
[source]
Fit estimator and transform dataset.
Parameters: |
X : array-like or sparse matrix, shape=(n_samples, n_features) Input data used to build forests. Use sample_weight : array-like, shape = [n_samples] or None Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node. |
---|---|
Returns: |
X_transformed : sparse matrix, shape=(n_samples, n_out) Transformed dataset. |
get_params(deep=True)
[source]
Get parameters for this estimator.
Parameters: |
deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. |
---|---|
Returns: |
params : mapping of string to any Parameter names mapped to their values. |
set_params(**params)
[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns: | self : |
---|
transform(X)
[source]
Transform dataset.
Parameters: |
X : array-like or sparse matrix, shape=(n_samples, n_features) Input data to be transformed. Use |
---|---|
Returns: |
X_transformed : sparse matrix, shape=(n_samples, n_out) Transformed dataset. |
sklearn.ensemble.RandomTreesEmbedding
© 2007–2017 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomTreesEmbedding.html