class sklearn.cluster.bicluster.SpectralBiclustering(n_clusters=3, method=’bistochastic’, n_components=6, n_best=3, svd_method=’randomized’, n_svd_vecs=None, mini_batch=False, init=’k-means++’, n_init=10, n_jobs=1, random_state=None)
[source]
Spectral biclustering (Kluger, 2003).
Partitions rows and columns under the assumption that the data has an underlying checkerboard structure. For instance, if there are two row partitions and three column partitions, each row will belong to three biclusters, and each column will belong to two biclusters. The outer product of the corresponding row and column label vectors gives this checkerboard structure.
Read more in the User Guide.
Parameters: |
n_clusters : integer or tuple (n_row_clusters, n_column_clusters) The number of row and column clusters in the checkerboard structure. method : string, optional, default: ‘bistochastic’ Method of normalizing and converting singular vectors into biclusters. May be one of ‘scale’, ‘bistochastic’, or ‘log’. The authors recommend using ‘log’. If the data is sparse, however, log normalization will not work, which is why the default is ‘bistochastic’. CAUTION: if n_components : integer, optional, default: 6 Number of singular vectors to check. n_best : integer, optional, default: 3 Number of best singular vectors to which to project the data for clustering. svd_method : string, optional, default: ‘randomized’ Selects the algorithm for finding singular vectors. May be ‘randomized’ or ‘arpack’. If ‘randomized’, uses n_svd_vecs : int, optional, default: None Number of vectors to use in calculating the SVD. Corresponds to mini_batch : bool, optional, default: False Whether to use mini-batch k-means, which is faster but may get different results. init : {‘k-means++’, ‘random’ or an ndarray} Method for initialization of k-means algorithm; defaults to ‘k-means++’. n_init : int, optional, default: 10 Number of random initializations that are tried with the k-means algorithm. If mini-batch k-means is used, the best initialization is chosen and the algorithm runs once. Otherwise, the algorithm is run for each initialization and the best solution chosen. n_jobs : int, optional, default: 1 The number of jobs to use for the computation. This works by breaking down the pairwise matrix into n_jobs even slices and computing them in parallel. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used. random_state : int, RandomState instance or None, optional, default: None If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by |
---|---|
Attributes: |
rows_ : array-like, shape (n_row_clusters, n_rows) Results of the clustering. columns_ : array-like, shape (n_column_clusters, n_columns) Results of the clustering, like row_labels_ : array-like, shape (n_rows,) Row partition labels. column_labels_ : array-like, shape (n_cols,) Column partition labels. |
fit (X[, y]) | Creates a biclustering for X. |
get_indices (i) | Row and column indices of the i’th bicluster. |
get_params ([deep]) | Get parameters for this estimator. |
get_shape (i) | Shape of the i’th bicluster. |
get_submatrix (i, data) | Returns the submatrix corresponding to bicluster i . |
set_params (**params) | Set the parameters of this estimator. |
__init__(n_clusters=3, method=’bistochastic’, n_components=6, n_best=3, svd_method=’randomized’, n_svd_vecs=None, mini_batch=False, init=’k-means++’, n_init=10, n_jobs=1, random_state=None)
[source]
biclusters_
Convenient way to get row and column indicators together.
Returns the rows_
and columns_
members.
fit(X, y=None)
[source]
Creates a biclustering for X.
Parameters: |
X : array-like, shape (n_samples, n_features) y : Ignored |
---|
get_indices(i)
[source]
Row and column indices of the i’th bicluster.
Only works if rows_
and columns_
attributes exist.
Parameters: |
i : int The index of the cluster. |
---|---|
Returns: |
row_ind : np.array, dtype=np.intp Indices of rows in the dataset that belong to the bicluster. col_ind : np.array, dtype=np.intp Indices of columns in the dataset that belong to the bicluster. |
get_params(deep=True)
[source]
Get parameters for this estimator.
Parameters: |
deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. |
---|---|
Returns: |
params : mapping of string to any Parameter names mapped to their values. |
get_shape(i)
[source]
Shape of the i’th bicluster.
Parameters: |
i : int The index of the cluster. |
---|---|
Returns: |
shape : (int, int) Number of rows and columns (resp.) in the bicluster. |
get_submatrix(i, data)
[source]
Returns the submatrix corresponding to bicluster i
.
Parameters: |
i : int The index of the cluster. data : array The data. |
---|---|
Returns: |
submatrix : array The submatrix corresponding to bicluster i. |
Works with sparse matrices. Only works if rows_
and columns_
attributes exist.
set_params(**params)
[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns: | self : |
---|
© 2007–2017 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.bicluster.SpectralBiclustering.html