Noise filters and prototype selection
TomekLinkRemoval
API
- class smote_variants.noise_removal.TomekLinkRemoval(strategy='remove_majority', nn_params=None, n_jobs=1, **_kwargs)[source]
Tomek link removal
References
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
- __init__(strategy='remove_majority', nn_params=None, n_jobs=1, **_kwargs)[source]
Constructor of the noise filter.
- Parameters:
strategy (str) – noise removal strategy: ‘remove_majority’/’remove_both’
nn_params (dict) – additional parameters for nearest neighbor calculations, any parameter NearestNeighbors accepts, and additionally use {‘metric’: ‘precomputed’, ‘metric_learning’: ‘<method>’, …} with <method> in ‘ITML’, ‘LSML’ to enable the learning of the metric to be used for neighborhood calculations
n_jobs (int) – number of jobs
Example
>>> noise_filter= smote_variants.noise_removal.TomekLinkRemoval()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
Tomek link removal
- References:
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
CondensedNearestNeighbors
API
- class smote_variants.noise_removal.CondensedNearestNeighbors(n_jobs=1, **_kwargs)[source]
Condensed nearest neighbors
References
BibTex:
@ARTICLE{condensed_nn, author={Hart, P.}, journal={IEEE Transactions on Information Theory}, title={The condensed nearest neighbor rule (Corresp.)}, year={1968}, volume={14}, number={3}, pages={515-516}, keywords={Pattern classification}, doi={10.1109/TIT.1968.1054155}, ISSN={0018-9448}, month={May}}
- __init__(n_jobs=1, **_kwargs)[source]
Constructor of the noise removing object
- Parameters:
n_jobs (int) – number of jobs
Example
>>> noise_filter= smote_variants.noise_removal.CondensedNearestNeighbors()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
Condensed nearest neighbors
- References:
BibTex:
@ARTICLE{condensed_nn, author={Hart, P.}, journal={IEEE Transactions on Information Theory}, title={The condensed nearest neighbor rule (Corresp.)}, year={1968}, volume={14}, number={3}, pages={515-516}, keywords={Pattern classification}, doi={10.1109/TIT.1968.1054155}, ISSN={0018-9448}, month={May}}
OneSidedSelection
API
- class smote_variants.noise_removal.OneSidedSelection(n_jobs=1, **_kwargs)[source]
References
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
- __init__(n_jobs=1, **_kwargs)[source]
Constructor of the noise removal object
- Parameters:
n_jobs (int) – number of jobs
Example
>>> noise_filter= smote_variants.noise_removal.OneSidedSelection()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
- References:
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
CNNTomekLinks
API
- class smote_variants.noise_removal.CNNTomekLinks(n_jobs=1)[source]
References
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
- __init__(n_jobs=1)[source]
Constructor of the noise removal object
- Parameters:
n_jobs (int) – number of parallel jobs
Example
>>> noise_filter= smote_variants.noise_removal.CNNTomekLinks()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
- References:
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
NeighborhoodCleaningRule
API
- class smote_variants.noise_removal.NeighborhoodCleaningRule(nn_params=None, n_jobs=1, **_kwargs)[source]
References
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
- __init__(nn_params=None, n_jobs=1, **_kwargs)[source]
Constructor of the noise removal object
- Parameters:
nn_params (dict) – additional parameters for nearest neighbor calculations, any parameter NearestNeighbors accepts, and additionally use {‘metric’: ‘precomputed’, ‘metric_learning’: ‘<method>’, …} with <method> in ‘ITML’, ‘LSML’ to enable the learning of the metric to be used for neighborhood calculations
n_jobs (int) – number of parallel jobs
Example
>>> noise_filter= smote_variants.noise_removal.NeighborhoodCleaningRule()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
- References:
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
EditedNearestNeighbors
API
- class smote_variants.noise_removal.EditedNearestNeighbors(remove='both', nn_params=None, n_jobs=1, **_kwargs)[source]
References
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }
- __init__(remove='both', nn_params=None, n_jobs=1, **_kwargs)[source]
Constructor of the noise removal object
- Parameters:
remove (str) – class to remove from ‘both’/’min’/’maj’
nn_params (dict) – additional parameters for nearest neighbor calculations, any parameter NearestNeighbors accepts, and additionally use {‘metric’: ‘precomputed’, ‘metric_learning’: ‘<method>’, …} with <method> in ‘ITML’, ‘LSML’ to enable the learning of the metric to be used for neighborhood calculations
n_jobs (int) – number of parallel jobs
Example
>>> noise_filter= smote_variants.noise_removal.EditedNearestNeighbors()
>>> X_samp, y_samp= noise_filter.remove_noise(X, y)
- References:
BibTex:
@article{smoteNoise0, author = {Batista, Gustavo E. A. P. A. and Prati, Ronaldo C. and Monard, Maria Carolina}, title = {A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data}, journal = {SIGKDD Explor. Newsl.}, issue_date = {June 2004}, volume = {6}, number = {1}, month = jun, year = {2004}, issn = {1931-0145}, pages = {20--29}, numpages = {10}, url = {http://doi.acm.org/10.1145/1007730.1007735}, doi = {10.1145/1007730.1007735}, acmid = {1007735}, publisher = {ACM}, address = {New York, NY, USA} }