Citation: | WU Sen, WANG Yu-zhi, GAO Xiao-nan. Clustering algorithm for imbalanced data based on nearest neighbor[J]. Chinese Journal of Engineering, 2020, 42(9): 1209-1219. doi: 10.13374/j.issn2095-9389.2019.10.09.003 |
[1] |
Wu S, Feng X D, Zhou W J. Spectral clustering of high-dimensional data exploiting sparse representation vectors. <italic>Neurocomputing</italic>, 2014, 135: 229 doi: 10.1016/j.neucom.2013.12.027
|
[2] |
Wilson J, Chaudhury S, Lall B. Clustering short temporal behaviour sequences for customer segmentation using LDA. <italic>Expert Syst</italic>, 2018, 35(3): e12250 doi: 10.1111/exsy.12250
|
[3] |
Zhao L B, Shi G Y. A trajectory clustering method based on Douglas-Peucker compression and density for marine traffic pattern recognition. <italic>Ocean Eng</italic>, 2019, 172: 456 doi: 10.1016/j.oceaneng.2018.12.019
|
[4] |
Al-Shammari A, Zhou R, Naseriparsaa M, et al. An effective density-based clustering and dynamic maintenance framework for evolving medical data streams. <italic>Int J Med Inform</italic>, 2019, 126: 176 doi: 10.1016/j.ijmedinf.2019.03.016
|
[5] |
胡圓, 李暉, 陳梅. 基于密度聚類的出租車異常軌跡檢測. 計算機與現代化, 2019(6):49 doi: 10.3969/j.issn.1006-2475.2019.06.008
Hu Y, Li H, Chen M. Taxi abnormal trajectory detection based on density clustering. <italic>Comput Modernization</italic>, 2019(6): 49 doi: 10.3969/j.issn.1006-2475.2019.06.008
|
[6] |
Han W H, Huang Z Z, Li S D, et al. Distribution-sensitive unbalanced data oversampling method for medical diagnosis. <italic>J Med Syst</italic>, 2019, 43(2): 39 doi: 10.1007/s10916-018-1154-8
|
[7] |
Chen L T, Xu G H, Zhang Q, et al. Learning deep representation of imbalanced SCADA data for fault detection of wind turbines. <italic>Meas</italic>, 2019, 139: 370 doi: 10.1016/j.measurement.2019.03.029
|
[8] |
Xiong H, Wu J J, Chen J. K–means clustering versus validation measures: A data-distribution perspective. <italic>IEEE Trans Syst Man Cybern Part B </italic>(<italic>Cybern</italic>)<italic></italic>, 2009, 39(2): 318 doi: 10.1109/TSMCB.2008.2004559
|
[9] |
駱自超, 金隼, 邱雪峰. 考慮類內不平衡的譜聚類過抽樣方法. 計算機工程與應用, 2014, 50(11):120 doi: 10.3778/j.issn.1002-8331.1312-0148
Luo Z C, Jin S, Qiu X F. Spectral clustering based oversampling: oversampling taking within class imbalance into consideration. <italic>Comput Eng Appl</italic>, 2014, 50(11): 120 doi: 10.3778/j.issn.1002-8331.1312-0148
|
[10] |
Kumar N S, Rao K N, Govardhan A, et al. Undersampled K–means approach for handling imbalanced distributed data. <italic>Prog Artif Intelligence</italic>, 2014, 3(1): 29 doi: 10.1007/s13748-014-0045-6
|
[11] |
武森, 劉露, 盧丹. 基于聚類欠采樣的集成不均衡數據分類算法. 工程科學學報, 2017, 39(8):1244
Wu S, Liu L, Lu D. Imbalanced data ensemble classification based on cluster-based under-sampling algorithm. <italic>Chin J Eng</italic>, 2017, 39(8): 1244
|
[12] |
Lin W C, Tsai C F, Hu Y H, et al. Clustering-based undersampling in class-imbalanced data. <italic>Inform Sci</italic>, 2017, 409-410: 17 doi: 10.1016/j.ins.2017.05.008
|
[13] |
Liang J Y, Bai L, Dang C Y, et al. The K–means–type algorithms versus imbalanced data distributions. <italic>IEEE Trans Fuzzy Syst</italic>, 2012, 20(4): 728 doi: 10.1109/TFUZZ.2011.2182354
|
[14] |
亓慧. 多中心的非平衡K–均值聚類方法. 中北大學學報(自然科學版), 2015, 36(4):453
Qi H. Imbalanced K–means clustering method with multiple centers. <italic>J North Univ China Nat Sci</italic>, 2015, 36(4): 453
|
[15] |
楊天鵬, 徐鯤鵬, 陳黎飛. 非均勻數據的變異系數聚類算法. 山東大學學報: 工學版, 2018, 48(3):140
Yang T P, Xu K P, Chen L F. Coefficient of variation clustering algorithm for non-uniform data. <italic>J Shandong Univ Eng Sci</italic>, 2018, 48(3): 140
|
[16] |
劉歡, 胡德敏. 類不平衡數據的卡方聚類算法研究. 軟件, 2019, 40(4):7 doi: 10.3969/j.issn.1003-6970.2019.04.002
Liu H, Hu D M. Research on Chi-square clustering algorithm for unbalanced data. <italic>Comput Eng Software</italic>, 2019, 40(4): 7 doi: 10.3969/j.issn.1003-6970.2019.04.002
|
[17] |
江鵬. 面向非平衡數據集的多簇IB算法研究[學位論文]. 鄭州: 鄭州大學, 2015
Jiang P. The Research of Multi-clusters IB Algorithm for Imbalanced Data Set [Dissertation]. Zhengzhou: Zhengzhou University, 2015
|
[18] |
白亮. 聚類學習的理論分析與高效算法研究[學位論文]. 太原: 山西大學, 2012
Bai L. Theoretical Analysis and Effective Algorithms of Cluster Learning [Dissertation]. Taiyuan: Shanxi University, 2012
|
[19] |
Gionis A, Mannila H, Tsaparas P. Clustering aggregation. <italic>ACM Trans Knowledge Discovery Data</italic>, 2007, 1(1): 1 doi: 10.1145/1217299.1217300
|
[20] |
Chen M, Li L J, Wang B, et al. Effectively clustering by finding density backbone based-on kNN. <italic>Pattern Recognit</italic>, 2016, 60: 486 doi: 10.1016/j.patcog.2016.04.018
|
[21] |
李濤, 葛洪偉, 蘇樹智. 基于密度自適應距離的密度峰聚類. 小型微型計算機系統, 2017, 38(6):1347 doi: 10.3969/j.issn.1000-1220.2017.06.032
Li T, Geng H W, Su S Z. Density peaks clustering based on density adaptive distance. <italic>J Chin Comput Syst</italic>, 2017, 38(6): 1347 doi: 10.3969/j.issn.1000-1220.2017.06.032
|
[22] |
Forina M. Wine Data Set [EB/OL]. UCI Machine Learning (1991-07-01) [2019-10-09]. http://archive.ics.uci.edu/ml/datasets/Wine
|
[23] |
Quinlan J R. Thyroid Disease Data Set [EB/OL]. UCI Machine Learning (1987-01-01) [2019-10-09]. http://archive.ics.uci.edu/ml/datasets/Thyroid+Disease
|
[24] |
Sigillito V G. Ionosphere Data Set [EB/OL]. UCI Machine Learning (1989-01-01) [2019-10-09]. http://archive.ics.uci.edu/ml/datasets/Ionosphere
|
[25] |
Dua D, Graff C. Statlog (Heart) Data Set [EB/OL]. UCI Machine Learning (1993-02-13) [2019-10-09]. http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29
|
[26] |
武鵬鵬. 初始類中心選擇及在非平衡數據中的聚類研究[學位論文]. 太原: 山西大學, 2015
Wu P P. Research on Initial Cluster Centers Choice Algorithm and Clustering for Imbalanced Data [Dissertation]. Taiyuan: Shanxi University, 2015
|
[27] |
傅立偉, 武森. 基于屬性值集中度的分類數據聚類有效性內部評價指標. 工程科學學報, 2019, 41(5):682
Fu L W, Wu S. A new internal clustering validation index for categorical data based on concentration of attribute values. <italic>Chin J Eng</italic>, 2019, 41(5): 682
|
[28] |
Hussain S F, Haris M. A K–means based co-clustering (kCC) algorithm for sparse, high dimensional data. <italic>Expert Syst Appl</italic>, 2019, 118: 20 doi: 10.1016/j.eswa.2018.09.006
|
[29] |
Yeh C C, Yang M S. Evaluation measures for cluster ensembles based on a fuzzy generalized Rand index. <italic>Appl Soft Comput</italic>, 2017, 57: 225 doi: 10.1016/j.asoc.2017.03.030
|
[30] |
Qannari E M, Courcoux P, Faye P. Significance test of the adjusted Rand index. Application to the free sorting task. <italic>Food Qual Preference</italic>, 2014, 32: 93 doi: 10.1016/j.foodqual.2013.05.005
|