Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier


Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research). An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community. We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.



Feng Zhang, Quan Zheng, Ying Zou, and Ahmed E. Hassan, Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier, Proceedings of the 38th International Conference on Software Engineering (ICSE'16), 2016, Austin, TX, United States. (Acceptance Rate = 19%)


 title={Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier},
 author={Zhang, Feng and Zheng, Quan and Zou, Ying and Hassan, Ahmed E.},
 booktitle={Proceedings of the 38th International Conference on Software Engineering},
 series={ICSE '16},
 keywords={defect prediction, cross-project, within-project, spectral clustering, graph mining},
Download the bibtex