Multiclass oversampling

Multiclass oversampling is highly ambiguous task, as balancing various classes might be optimal with various oversampling techniques. The multiclass oversampling goes on by selecting minority classes one-by-one and oversampling them to the same cardinality as the original majority class, using the union of the original majority class and all already oversampled classes as the majority class in the binary oversampling process. This technique works only with those binary oversampling techniques which do not change the majority class and have a proportion parameter to explicitly specify the number of samples to be generated. Suitable oversampling techniques can be queried by the get_all_oversamplers_multiclass function:

smote_variants.get_all_oversamplers_multiclass(strategy='eq_1_vs_many_successive')[source]

Returns all oversampling classes which can be used with the multiclass strategy specified

Parameters:strategy (str) – the multiclass oversampling strategy - ‘eq_1_vs_many_successive’/’equalize_1_vs_many’
Returns:
list of all oversampling classes which can be used
with the multiclass strategy specified
Return type:list(OverSampling)

Example:

import smote_variants as sv

oversamplers= sv.get_all_oversamplers_multiclass()
smote_variants.get_n_quickest_oversamplers_multiclass(n, strategy='eq_1_vs_many_successive')[source]

Returns the n quickest oversamplers based on testing on the datasets of the imbalanced_databases package, and suitable for using the multiclass strategy specified.

Parameters:
  • n (int) – number of oversamplers to return
  • strategy (str) – the multiclass oversampling strategy - ‘eq_1_vs_many_successive’/’equalize_1_vs_many’
Returns:

list of n quickest oversampling classes which can

be used with the multiclass strategy specified

Return type:

list(OverSampling)

Example:

import smote_variants as sv

oversamplers= sv.get_n_quickest_oversamplers_multiclass()
class smote_variants.MulticlassOversampling(oversampler=<smote_variants._smote_variants.SMOTE object>, strategy='eq_1_vs_many_successive')[source]
__init__(oversampler=<smote_variants._smote_variants.SMOTE object>, strategy='eq_1_vs_many_successive')[source]

Constructor of the multiclass oversampling object

Parameters:
  • oversampler (obj) – an oversampling object
  • strategy (str/obj) – a multiclass oversampling strategy, currently ‘eq_1_vs_many_successive’ or ‘equalize_1_vs_many’
get_params(deep=False)[source]
Returns:the parameters of the multiclass oversampling object
Return type:dict
sample(X, y)[source]

Does the sample generation according to the oversampling strategy.

Parameters:
  • X (np.ndarray) – training set
  • y (np.array) – target labels
Returns:

the extended training set and target labels

Return type:

(np.ndarray, np.array)

sample_equalize_1_vs_many(X, y)[source]

Does the sample generation by oversampling each minority class to the cardinality of the majority class using all original samples in each run.

Parameters:
  • X (np.ndarray) – training set
  • y (np.array) – target labels
Returns:

the extended training set and target labels

Return type:

(np.ndarray, np.array)

sample_equalize_1_vs_many_successive(X, y)[source]

Does the sample generation by oversampling each minority class successively to the cardinality of the majority class, incorporating the results of previous oversamplings.

Parameters:
  • X (np.ndarray) – training set
  • y (np.array) – target labels
Returns:

the extended training set and target labels

Return type:

(np.ndarray, np.array)