sklearn.datasets.fetch_mldata(dataname, target_name=’label’, data_name=’data’, transpose_data=True, data_home=None)
[source]
Fetch an mldata.org data set
If the file does not exist yet, it is downloaded from mldata.org .
mldata.org does not have an enforced convention for storing data or naming the columns in a data set. The default behavior of this function works well with the most common cases:
n_features x n_samples
, and thus needs to be transposed to match the sklearn
standardKeyword arguments allow to adapt these defaults to specific data sets (see parameters target_name
, data_name
, transpose_data
, and the examples below).
mldata.org data sets may have multiple columns, which are stored in the Bunch object with their original name.
Parameters: |
dataname : str Name of the data set on mldata.org, e.g.: “leukemia”, “Whistler Daily Snowfall”, etc. The raw name is automatically converted to a mldata.org URL . target_name : optional, default: ‘label’ Name or index of the column containing the target values. data_name : optional, default: ‘data’ Name or index of the column containing the data. transpose_data : optional, default: True If True, transpose the downloaded data array. data_home : optional, default: None Specify another download and cache folder for the data sets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders. |
---|---|
Returns: |
data : Bunch Dictionary-like object, the interesting attributes are: ‘data’, the data to learn, ‘target’, the classification labels, ‘DESCR’, the full description of the dataset, and ‘COL_NAMES’, the original names of the dataset columns. |
Load the ‘iris’ dataset from mldata.org:
>>> from sklearn.datasets.mldata import fetch_mldata >>> import tempfile >>> test_data_home = tempfile.mkdtemp()
>>> iris = fetch_mldata('iris', data_home=test_data_home) >>> iris.target.shape (150,) >>> iris.data.shape (150, 4)
Load the ‘leukemia’ dataset from mldata.org, which needs to be transposed to respects the scikit-learn axes convention:
>>> leuk = fetch_mldata('leukemia', transpose_data=True, ... data_home=test_data_home) >>> leuk.data.shape (72, 7129)
Load an alternative ‘iris’ dataset, which has different names for the columns:
>>> iris2 = fetch_mldata('datasets-UCI iris', target_name=1, ... data_name=0, data_home=test_data_home) >>> iris3 = fetch_mldata('datasets-UCI iris', ... target_name='class', data_name='double0', ... data_home=test_data_home)
>>> import shutil >>> shutil.rmtree(test_data_home)
sklearn.datasets.fetch_mldata
© 2007–2017 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_mldata.html