<listing id="l9bhj"><var id="l9bhj"></var></listing>
<var id="l9bhj"><strike id="l9bhj"></strike></var>
<menuitem id="l9bhj"></menuitem>
<cite id="l9bhj"><strike id="l9bhj"></strike></cite>
<cite id="l9bhj"><strike id="l9bhj"></strike></cite>
<var id="l9bhj"></var><cite id="l9bhj"><video id="l9bhj"></video></cite>
<menuitem id="l9bhj"></menuitem>
<cite id="l9bhj"><strike id="l9bhj"><listing id="l9bhj"></listing></strike></cite><cite id="l9bhj"><span id="l9bhj"><menuitem id="l9bhj"></menuitem></span></cite>
<var id="l9bhj"></var>
<var id="l9bhj"></var>
<var id="l9bhj"></var>
<var id="l9bhj"><strike id="l9bhj"></strike></var>
<ins id="l9bhj"><span id="l9bhj"></span></ins>
Volume 39 Issue 8
Aug.  2017
Turn off MathJax
Article Contents
WU Sen, LIU Lu, LU Dan. Imbalanced data ensemble classification based on cluster-based under-sampling algorithm[J]. Chinese Journal of Engineering, 2017, 39(8): 1244-1253. doi: 10.13374/j.issn2095-9389.2017.08.015
Citation: WU Sen, LIU Lu, LU Dan. Imbalanced data ensemble classification based on cluster-based under-sampling algorithm[J]. Chinese Journal of Engineering, 2017, 39(8): 1244-1253. doi: 10.13374/j.issn2095-9389.2017.08.015

Imbalanced data ensemble classification based on cluster-based under-sampling algorithm

doi: 10.13374/j.issn2095-9389.2017.08.015
  • Received Date: 2016-12-30
  • Most traditional classification algorithms assume the data set to be well-balanced and focus on achieving overall classification accuracy. However, actual data sets are usually imbalanced, so traditional classification approaches may lead to classification errors in minority class samples. With respect to imbalanced data, there are two main methods for improving classification performance. The first is to improve the data set by increasing the number of minority class samples by over-sampling and decreasing the number of majority class samples by under-sampling. The other method is to improve the algorithm itself. By combining the cluster-based under-sampling method with ensemble classification, in this paper, an approach was proposed for classifying imbalanced data. First, the cluster-based under-sampling method is used to establish a balanced data set in the data processing stage, and then the new data set is trained by the AdaBoost ensemble algorithm. In the integration process, when calculating the error rate of integrated learning, this algorithm uses weights to distinguish minority class data from majority class data. This makes the algorithm focus more on small data classes, thereby improving the classification accuracy of minority class data.

     

  • loading
  • [1]
    Napierala K, Stefanowski J. Types of minority class examples and their influence on learning classifiers from imbalanced data. J Intell Inf Syst, 2016, 46(3):563
    [2]
    Glauner P, Boechat A, Dolberg L, et al. Large-scale detection of non-technical losses in imbalanced data sets//2016 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT). Minneapolis, 2016
    [3]
    Haque M N, Noman N, Berretta R, et al. Heterogeneous ensemble combination search using genetic algorithm for class imbalanced data classification. Plos One, 2016, 11(1):e0146116
    [4]
    Klein K, Hennig S, Paul S K. A bayesian modelling approach with balancing informative prior for analysing imbalanced data. Plos One, 2016, 11(4):e0152700
    [5]
    Chawla N V, Bowyer K W, Hall L O, et al. SMOTE:synthetic minority over-sampling technique. J Artif Intell Res, 2002, 16:321
    [9]
    Liu X Y, Wu J X, Zhou Z H. Exploratory under-sampling for class-imbalance learning. IEEE Trans Syst Man Cybernetics Part B Cybernetics, 2009, 39(2):539
    [10]
    Mani I, Zhang I. kNN approach to unbalanced data distributions:a case study involving information extraction//Proceedings of the ICML 2003 Workshop on Learning from Imbalanced Datasets. Washington DC,2003:42
    [11]
    Kubat M, Matwin S. Addressing the curse of imbalanced training sets:one-sided selection//International Conference on Machine Learning. Scotland, 2012:179
    [13]
    Dietterich T G. Machine learning research:four current directions. Artif Intell Mag, 1997, 18(4):97
  • 加載中

Catalog

    通訊作者: 陳斌, bchen63@163.com
    • 1. 

      沈陽化工大學材料科學與工程學院 沈陽 110142

    1. 本站搜索
    2. 百度學術搜索
    3. 萬方數據庫搜索
    4. CNKI搜索
    Article views (1180) PDF downloads(38) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return
    久色视频