class sklearn.decomposition.LatentDirichletAllocation(n_components=10, doc_topic_prior=None, topic_word_prior=None, learning_method=None, learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=1, verbose=0, random_state=None, n_topics=None)
[source]
Latent Dirichlet Allocation with online variational Bayes algorithm
New in version 0.17.
Read more in the User Guide.
Parameters: |
n_components : int, optional (default=10) Number of topics. doc_topic_prior : float, optional (default=None) Prior of document topic distribution topic_word_prior : float, optional (default=None) Prior of topic word distribution learning_method : ‘batch’ | ‘online’, default=’online’ Method used to update 'batch': Batch variational Bayes method. Use all training data in each EM update. Old `components_` will be overwritten in each iteration. 'online': Online variational Bayes method. In each EM update, use mini-batch of training data to update the ``components_`` variable incrementally. The learning rate is controlled by the ``learning_decay`` and the ``learning_offset`` parameters. learning_decay : float, optional (default=0.7) It is a parameter that control learning rate in the online learning method. The value should be set between (0.5, 1.0] to guarantee asymptotic convergence. When the value is 0.0 and batch_size is learning_offset : float, optional (default=10.) A (positive) parameter that downweights early iterations in online learning. It should be greater than 1.0. In the literature, this is called tau_0. max_iter : integer, optional (default=10) The maximum number of iterations. batch_size : int, optional (default=128) Number of documents to use in each EM iteration. Only used in online learning. evaluate_every : int, optional (default=0) How often to evaluate perplexity. Only used in total_samples : int, optional (default=1e6) Total number of documents. Only used in the perp_tol : float, optional (default=1e-1) Perplexity tolerance in batch learning. Only used when mean_change_tol : float, optional (default=1e-3) Stopping tolerance for updating document topic distribution in E-step. max_doc_update_iter : int (default=100) Max number of iterations for updating document topic distribution in the E-step. n_jobs : int, optional (default=1) The number of jobs to use in the E-step. If -1, all CPUs are used. For verbose : int, optional (default=0) Verbosity level. random_state : int, RandomState instance or None, optional (default=None) If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by n_topics : int, optional (default=None) This parameter has been renamed to n_components and will be removed in version 0.21. .. deprecated:: 0.19 |
---|---|
Attributes: |
components_ : array, [n_components, n_features] Variational parameters for topic word distribution. Since the complete conditional for topic word distribution is a Dirichlet, n_batch_iter_ : int Number of iterations of the EM step. n_iter_ : int Number of passes over the dataset. |
fit (X[, y]) | Learn model for the data X with variational Bayes method. |
fit_transform (X[, y]) | Fit to data, then transform it. |
get_params ([deep]) | Get parameters for this estimator. |
partial_fit (X[, y]) | Online VB with Mini-Batch update. |
perplexity (X[, doc_topic_distr, sub_sampling]) | Calculate approximate perplexity for data X. |
score (X[, y]) | Calculate approximate log-likelihood as score. |
set_params (**params) | Set the parameters of this estimator. |
transform (X) | Transform data X according to the fitted model. |
__init__(n_components=10, doc_topic_prior=None, topic_word_prior=None, learning_method=None, learning_decay=0.7, learning_offset=10.0, max_iter=10, batch_size=128, evaluate_every=-1, total_samples=1000000.0, perp_tol=0.1, mean_change_tol=0.001, max_doc_update_iter=100, n_jobs=1, verbose=0, random_state=None, n_topics=None)
[source]
fit(X, y=None)
[source]
Learn model for the data X with variational Bayes method.
When learning_method
is ‘online’, use mini-batch update. Otherwise, use batch update.
Parameters: |
X : array-like or sparse matrix, shape=(n_samples, n_features) Document word matrix. y : Ignored. |
---|---|
Returns: |
self : |
fit_transform(X, y=None, **fit_params)
[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: |
X : numpy array of shape [n_samples, n_features] Training set. y : numpy array of shape [n_samples] Target values. |
---|---|
Returns: |
X_new : numpy array of shape [n_samples, n_features_new] Transformed array. |
get_params(deep=True)
[source]
Get parameters for this estimator.
Parameters: |
deep : boolean, optional If True, will return the parameters for this estimator and contained subobjects that are estimators. |
---|---|
Returns: |
params : mapping of string to any Parameter names mapped to their values. |
partial_fit(X, y=None)
[source]
Online VB with Mini-Batch update.
Parameters: |
X : array-like or sparse matrix, shape=(n_samples, n_features) Document word matrix. y : Ignored. |
---|---|
Returns: |
self : |
perplexity(X, doc_topic_distr=’deprecated’, sub_sampling=False)
[source]
Calculate approximate perplexity for data X.
Perplexity is defined as exp(-1. * log-likelihood per word)
Changed in version 0.19: doc_topic_distr argument has been deprecated and is ignored because user no longer has access to unnormalized distribution
Parameters: |
X : array-like or sparse matrix, [n_samples, n_features] Document word matrix. doc_topic_distr : None or array, shape=(n_samples, n_components) Document topic distribution. This argument is deprecated and is currently being ignored. Deprecated since version 0.19. sub_sampling : bool Do sub-sampling or not. |
---|---|
Returns: |
score : float Perplexity score. |
score(X, y=None)
[source]
Calculate approximate log-likelihood as score.
Parameters: |
X : array-like or sparse matrix, shape=(n_samples, n_features) Document word matrix. y : Ignored. |
---|---|
Returns: |
score : float Use approximate bound as score. |
set_params(**params)
[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns: | self : |
---|
transform(X)
[source]
Transform data X according to the fitted model.
Changed in version 0.18: doc_topic_distr is now normalized
Parameters: |
X : array-like or sparse matrix, shape=(n_samples, n_features) Document word matrix. |
---|---|
Returns: |
doc_topic_distr : shape=(n_samples, n_components) Document topic distribution for X. |
sklearn.decomposition.LatentDirichletAllocation
© 2007–2017 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html