Noise filters and prototype selection

TomekLinkRemoval

API

class smote_variants.noise_removal.TomekLinkRemoval(strategy='remove_majority', nn_params=None, n_jobs=1, **_kwargs)[source]

Tomek link removal

References

  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    
__init__(strategy='remove_majority', nn_params=None, n_jobs=1, **_kwargs)[source]

Constructor of the noise filter.

Parameters:
  • strategy (str) – noise removal strategy: ‘remove_majority’/’remove_both’

  • nn_params (dict) – additional parameters for nearest neighbor calculations, any parameter NearestNeighbors accepts, and additionally use {‘metric’: ‘precomputed’, ‘metric_learning’: ‘<method>’, …} with <method> in ‘ITML’, ‘LSML’ to enable the learning of the metric to be used for neighborhood calculations

  • n_jobs (int) – number of jobs

get_params(deep=False)[source]

Return parameters

Returns:

dictionary of parameters

Return type:

dict

remove_noise(X, y)[source]

Removes noise from dataset

Parameters:
  • X (np.array) – features

  • y (np.array) – target labels

Returns:

dataset after noise removal

Return type:

np.array, np.array

Example

>>> noise_filter= smote_variants.noise_removal.TomekLinkRemoval()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/TomekLinkRemoval.png

Tomek link removal

References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    

CondensedNearestNeighbors

API

class smote_variants.noise_removal.CondensedNearestNeighbors(n_jobs=1, **_kwargs)[source]

Condensed nearest neighbors

References

  • BibTex:

    @ARTICLE{condensed_nn,
                author={Hart, P.},
                journal={IEEE Transactions on Information Theory},
                title={The condensed nearest neighbor rule (Corresp.)},
                year={1968},
                volume={14},
                number={3},
                pages={515-516},
                keywords={Pattern classification},
                doi={10.1109/TIT.1968.1054155},
                ISSN={0018-9448},
                month={May}}
    
__init__(n_jobs=1, **_kwargs)[source]

Constructor of the noise removing object

Parameters:

n_jobs (int) – number of jobs

get_params(deep=False)[source]

Return parameters

Returns:

dictionary of parameters

Return type:

dict

remove_noise(X, y)[source]

Removes noise from dataset

Parameters:
  • X (np.array) – features

  • y (np.array) – target labels

Returns:

dataset after noise removal

Return type:

np.array, np.array

Example

>>> noise_filter= smote_variants.noise_removal.CondensedNearestNeighbors()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/CondensedNearestNeighbors.png

Condensed nearest neighbors

References:
  • BibTex:

    @ARTICLE{condensed_nn,
                author={Hart, P.},
                journal={IEEE Transactions on Information Theory},
                title={The condensed nearest neighbor rule (Corresp.)},
                year={1968},
                volume={14},
                number={3},
                pages={515-516},
                keywords={Pattern classification},
                doi={10.1109/TIT.1968.1054155},
                ISSN={0018-9448},
                month={May}}
    

OneSidedSelection

API

class smote_variants.noise_removal.OneSidedSelection(n_jobs=1, **_kwargs)[source]

References

  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods
                        for Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    
__init__(n_jobs=1, **_kwargs)[source]

Constructor of the noise removal object

Parameters:

n_jobs (int) – number of jobs

get_params(deep=False)[source]

Return parameters

Returns:

dictionary of parameters

Return type:

dict

remove_noise(X, y)[source]

Removes noise

Parameters:
  • X (np.array) – features

  • y (np.array) – target labels

Returns:

cleaned features and target labels

Return type:

np.array, np.array

Example

>>> noise_filter= smote_variants.noise_removal.OneSidedSelection()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/OneSidedSelection.png
References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods
                        for Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    

NeighborhoodCleaningRule

API

class smote_variants.noise_removal.NeighborhoodCleaningRule(nn_params=None, n_jobs=1, **_kwargs)[source]

References

  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    
__init__(nn_params=None, n_jobs=1, **_kwargs)[source]

Constructor of the noise removal object

Parameters:
  • nn_params (dict) – additional parameters for nearest neighbor calculations, any parameter NearestNeighbors accepts, and additionally use {‘metric’: ‘precomputed’, ‘metric_learning’: ‘<method>’, …} with <method> in ‘ITML’, ‘LSML’ to enable the learning of the metric to be used for neighborhood calculations

  • n_jobs (int) – number of parallel jobs

get_params(deep=False)[source]

Return parameters

Returns:

dictionary of parameters

Return type:

dict

remove_noise(X, y)[source]

Removes noise

Parameters:
  • X (np.array) – features

  • y (np.array) – target labels

Returns:

cleaned features and target labels

Return type:

np.array, np.array

Example

>>> noise_filter= smote_variants.noise_removal.NeighborhoodCleaningRule()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/NeighborhoodCleaningRule.png
References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    

EditedNearestNeighbors

API

class smote_variants.noise_removal.EditedNearestNeighbors(remove='both', nn_params=None, n_jobs=1, **_kwargs)[source]

References

  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }
    
__init__(remove='both', nn_params=None, n_jobs=1, **_kwargs)[source]

Constructor of the noise removal object

Parameters:
  • remove (str) – class to remove from ‘both’/’min’/’maj’

  • nn_params (dict) – additional parameters for nearest neighbor calculations, any parameter NearestNeighbors accepts, and additionally use {‘metric’: ‘precomputed’, ‘metric_learning’: ‘<method>’, …} with <method> in ‘ITML’, ‘LSML’ to enable the learning of the metric to be used for neighborhood calculations

  • n_jobs (int) – number of parallel jobs

get_params(deep=False)[source]

Return parameters

Returns:

dictionary of parameters

Return type:

dict

remove_noise(X, y)[source]

Removes noise

Parameters:
  • X (np.array) – features

  • y (np.array) – target labels

Returns:

cleaned features and target labels

Return type:

np.array, np.array

Example

>>> noise_filter= smote_variants.noise_removal.EditedNearestNeighbors()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
_images/base.png _images/EditedNearestNeighbors.png
References:
  • BibTex:

    @article{smoteNoise0,
             author = {Batista, Gustavo E. A. P. A. and Prati,
                        Ronaldo C. and Monard, Maria Carolina},
             title = {A Study of the Behavior of Several Methods for
                        Balancing Machine Learning Training Data},
             journal = {SIGKDD Explor. Newsl.},
             issue_date = {June 2004},
             volume = {6},
             number = {1},
             month = jun,
             year = {2004},
             issn = {1931-0145},
             pages = {20--29},
             numpages = {10},
             url = {http://doi.acm.org/10.1145/1007730.1007735},
             doi = {10.1145/1007730.1007735},
             acmid = {1007735},
             publisher = {ACM},
             address = {New York, NY, USA}
            }