Towards Building a Universal Defect Prediction Model


To predict files with defects, a suitable prediction model must be built for a software project from either itself (withinproject) or other projects (cross-project). A universal defect prediction model that is built from the entire set of diverse projects would relieve the need for building models for an individual project. A universal model could also be interpreted as a basic relationship between software metrics and defects. However, the variations in the distribution of predictors pose a formidable obstacle to build a universal model. Such variations exist among projects with different context factors (e.g., size and programming language). To overcome this challenge, we propose context-aware rank transformations for predictors. We cluster projects based on the similarity of the distribution of 26 predictors, and derive the rank transformations using quantiles of predictors for a cluster. We then fit the universal model on the transformed data of 1,398 open source projects hosted on SourceForge and GoogleCode. Adding context factors to the universal model improves the predictive power. The universal model obtains prediction performance comparable to the within-project models and yields similar results when applied on five external projects (one Apache and four Eclipse projects). These results suggest that a universal defect prediction model may be an achievable goal.



Feng Zhang, Audris Mockus, Iman, Keivanloo, and Ying Zou, Towards Building a Universal Defect Prediction Model, Proceedings of the 11th Working Conference on Mining Software Repositories (MSR'14), May 31-Jun 1, 2014, Hyderabad, India. (Acceptance Rate = 34%) (Distinguished Paper Award)


 title = {Towards building a universal defect prediction model},
 author = {Zhang, Feng and Mockus, Audris and Keivanloo, Iman and Zou, Ying},
 booktitle = {Proceedings of the 11th Working Conference on Mining Software Repositories},
 series = {MSR '14},
 year = {2014},
 isbn = {978-1-4503-2863-0},
 location = {Hyderabad, India},
 pages = {41--50},
 numpages = {10},
 publisher = {IEEE Press},
 address = {Piscataway, NJ, USA},
Download the bibtex