Clustering algorithm based on set dissimilarity for high dimensional data of categorical attributes
-
摘要: 提出基于集合差異度的聚類算法.算法通過定義的集合差異度和集合精簡表示,直接進行一個集合內所有對象總體差異程度的計算,而不必計算兩兩對象間的距離,并且在不影響計算精確度的情況下對分類屬性高維數據進行高度壓縮,只需一次數據掃描即得到聚類結果.算法計算時間復雜度接近線性.實例表明該算法是有效的.Abstract: A clustering algorithm is proposed based on set dissimilarity. Through defining set dissimilarity and set reduction, it does not calculate the distance between each pair of objects but computes the general dissimilarity of all the objects in a set directly, reduces high-dimensional categorical data enormously without loss of computation accuracy and gets the clustering result by only once data scanning. The time complexity of the algorithm is almost linear. An example of real data shows that the clustering algorithm is effective.
-
Key words:
- clustering /
- high-dimensional space /
- sets /
- dissimilarity /
- data mining
-

計量
- 文章訪問數: 128
- HTML全文瀏覽量: 32
- PDF下載量: 4
- 被引次數: 0