Noise filters and prototype selection

TomekLinkRemoval

API

class smote_variants.TomekLinkRemoval(strategy='remove_majority', n_jobs=1)[source]
__init__(strategy='remove_majority', n_jobs=1)[source]

Constructor of the noise filter.

Parameters:
  • strategy (str) – noise removal strategy: ‘remove_majority’/’remove_both’
  • n_jobs (int) – number of jobs
remove_noise(X, y)[source]

Removes noise from dataset

Parameters:
  • X (np.matrix) – features
  • y (np.array) – target labels
Returns:

dataset after noise removal

Return type:

np.matrix, np.array

Example

>>> noise_filter= smote_variants.TomekLinkRemoval()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/TomekLinkRemoval.png

Tomek link removal

References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    

CondensedNearestNeighbors

API

class smote_variants.CondensedNearestNeighbors(n_jobs=1)[source]
__init__(n_jobs=1)[source]

Constructor of the noise removing object

Parameters:n_jobs (int) – number of jobs
remove_noise(X, y)[source]

Removes noise from dataset

Parameters:
  • X (np.matrix) – features
  • y (np.array) – target labels
Returns:

dataset after noise removal

Return type:

np.matrix, np.array

Example

>>> noise_filter= smote_variants.CondensedNearestNeighbors()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/CondensedNearestNeighbors.png

Condensed nearest neighbors

References:
  • BibTex:

    @ARTICLE{condensed_nn,
                author={Hart, P.},
                journal={IEEE Transactions on Information Theory},
                title={The condensed nearest neighbor rule (Corresp.)},
                year={1968},
                volume={14},
                number={3},
                pages={515-516},
                keywords={Pattern classification},
                doi={10.1109/TIT.1968.1054155},
                ISSN={0018-9448},
                month={May}}
    

OneSidedSelection

API

class smote_variants.OneSidedSelection(n_jobs=1)[source]
__init__(n_jobs=1)[source]

Constructor of the noise removal object

Parameters:n_jobs (int) – number of jobs
remove_noise(X, y)[source]

Removes noise

Parameters:
  • X (np.matrix) – features
  • y (np.array) – target labels
Returns:

cleaned features and target labels

Return type:

np.matrix, np.array

Example

>>> noise_filter= smote_variants.OneSidedSelection()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/OneSidedSelection.png
References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods
                        for Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    

NeighborhoodCleaningRule

API

class smote_variants.NeighborhoodCleaningRule(n_jobs=1)[source]
__init__(n_jobs=1)[source]

Constructor of the noise removal object

Parameters:n_jobs (int) – number of parallel jobs
remove_noise(X, y)[source]

Removes noise

Parameters:
  • X (np.matrix) – features
  • y (np.array) – target labels
Returns:

cleaned features and target labels

Return type:

np.matrix, np.array

Example

>>> noise_filter= smote_variants.NeighborhoodCleaningRule()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/NeighborhoodCleaningRule.png
References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    

EditedNearestNeighbors

API

class smote_variants.EditedNearestNeighbors(remove='both', n_jobs=1)[source]
__init__(remove='both', n_jobs=1)[source]

Constructor of the noise removal object

Parameters:
  • remove (str) – class to remove from ‘both’/’min’/’maj’
  • n_jobs (int) – number of parallel jobs
get_params()[source]

Get noise removal parameters

Returns:dictionary of parameters
Return type:dict
remove_noise(X, y)[source]

Removes noise

Parameters:
  • X (np.matrix) – features
  • y (np.array) – target labels
Returns:

cleaned features and target labels

Return type:

np.matrix, np.array

Example

>>> noise_filter= smote_variants.EditedNearestNeighbors()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/EditedNearestNeighbors.png
References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }