A composite Feature Selection Method to improve Classifying Imbalanced Big Data
Abstract
Feature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the sampling distribution between the two data classes in the training set. To solve the imbalance problem, there are several methods used, some of which depend on redistributing the data and others of which depend on improving the classification algorithm itself. The feature selection can also affect the improvement of imbalanced data classification results when the features are chosen carefully. Therefore, this research proposed a composed feature selection method using the filter feature selection technique and permutation-based important features with the ensemble learning method. Three classifiers were used with three performance metrics (AUC, G-means, and F-score ) to show the effect of proposed feature selection method with imbalanced big data. The results of using proposed method led to improved classification on five standard imbalanced data sets.