posts tagged * publication

The Use of Summation to Aggregate Software Metrics Hinders the Performance of Defect Prediction Models

2016-7-24

Defect prediction models help software organizations to anticipate where defects will appear in the future. When training a defect prediction model, historical defect data is often mined from a Version Control System (VCS, e.g., Subversion), which records software changes at the file-level. Software metrics, on the other hand, are often calculated at the class- or method-level (e.g., McCabe’s Cyclomatic Complexity). To address the disagreement in granularity, the class- and method-level software metrics are aggregated to file-level, often using summation (i.e., McCabe of a file is the sum of the McCabe of all methods within the file). A recent study shows that summation significantly inflates the correlation between lines of code (Sloc) and cyclomatic complexity (Cc) in Java projects. While there are many other aggregation schemes (e.g., central tendency, dispersion), they have remained unexplored in the scope of defect prediction. In this study, we set out to investigate how different aggregation schemes impact defect prediction models. Through an analysis of 11 aggregation schemes using data collected from 255 open source projects, we find that: (1) aggregation schemes can significantly alter correlations among metrics, as well as the correlations between metrics and the defect count; (2) when constructing models to predict defect proneness, applying only the summation scheme (i.e., the most commonly used aggregation scheme in the literature) only achieves the best performance (the best among the 12 studied configurations) in 11% of the studied projects, while applying all of the studied aggregation schemes achieves the best performance in 40% of the studied projects; (3) when constructing models to predict defect rank or count, either applying only the summation or applying all of the studied aggregation schemes achieves similar performance, with both achieving the closest to the best performance more often than the other studied aggregation schemes; and (4) when constructing models for effort-aware defect prediction, the mean or median aggregation schemes yield performance values that are significantly closer to the best performance than any of the other studied aggregation schemes. Broadly speaking, the performance of defect prediction models are often underestimated due to our community’s tendency to only use the summation aggregation scheme. Given the potential benefit of applying additional aggregation schemes, we advise that future defect prediction models should explore a variety of aggregation schemes.

read more

Cross-project Defect Prediction Using A Connectivity-based Unsupervised Classifier

2016-5-17

Defect prediction on projects with limited historical data has attracted great interest from both researchers and practitioners. Cross-project defect prediction has been the main area of progress by reusing classifiers from other projects. However, existing approaches require some degree of homogeneity (e.g., a similar distribution of metric values) between the training projects and the target project. Satisfying the homogeneity requirement often requires significant effort (currently a very active area of research). An unsupervised classifier does not require any training data, therefore the heterogeneity challenge is no longer an issue. In this paper, we examine two types of unsupervised classifiers: a) distance-based classifiers (e.g., k-means); and b) connectivity-based classifiers. While distance-based unsupervised classifiers have been previously used in the defect prediction literature with disappointing performance, connectivity-based classifiers have never been explored before in our community. We compare the performance of unsupervised classifiers versus supervised classifiers using data from 26 projects from three publicly available datasets (i.e., AEEEM, NASA, and PROMISE). In the cross-project setting, our proposed connectivity-based classifier (via spectral clustering) ranks as one of the top classifiers among five widely-used supervised classifiers (i.e., random forest, naive Bayes, logistic regression, decision tree, and logistic model tree) and five unsupervised classifiers (i.e., k-means, partition around medoids, fuzzy C-means, neural-gas, and spectral clustering). In the within-project setting (i.e., models are built and applied on the same project), our spectral classifier ranks in the second tier, while only random forest ranks in the first tier. Hence, connectivity-based unsupervised classifiers offer a viable solution for cross and within project defect predictions.

read more

Towards Building a Universal Defect Prediction Model with Rank Transformed Predictors

2015-7-1

Software defects can lead to undesired results. Correcting defects costs 50 % to 75 % of the total software development budgets. To predict defective files, a prediction model must be built with predictors (e.g., software metrics) obtained from either a project itself (within-project) or from other projects (cross-project). A universal defect prediction model that is built from a large set of diverse projects would relieve the need to build and tailor prediction models for an individual project. A formidable obstacle to build a universal model is the variations in the distribution of predictors among projects of diverse contexts (e.g., size and programming language). Hence, we propose to cluster projects based on the similarity of the distribution of predictors, and derive the rank transformations using quantiles of predictors for a cluster. We fit the universal model on the transformed data of 1,385 open source projects hosted on SourceForge and GoogleCode. The universal model obtains prediction performance comparable to the within-project models, yields similar results when applied on five external projects (one Apache and four Eclipse projects), and performs similarly among projects with different context factors. At last, we investigate what predictors should be included in the universal model. We expect that this work could form a basis for future work on building a universal model and would lead to software support tools that incorporate it into a regular development workflow.

read more

Threshold-free Code Clone Detection for a Large-scale Heterogeneous Java Repository

2015-3-1

Code clones are unavoidable entities in software ecosystems. A variety of clone-detection algorithms are available for finding code clones. For Type-3 clone detection at method granularity (i.e., similar methods with changes in statements), dissimilarity threshold is one of the possible configuration parameters. Existing approaches use a single threshold to detect Type-3 clones across a repository. However, our study shows that to detect Type-3 clones at method granularity on a large-scale heterogeneous repository, multiple thresholds are often required. We find that the performance of clone detection improves if selecting different thresholds for various groups of clones in a heterogeneous repository (i.e., various applications). In this paper, we propose a threshold-free approach to detect Type-3 clones at method granularity across a large number of applications. Our approach uses an unsupervised learning algorithm, i.e., k-means, to determine true and false clones. We use a clone benchmark with 330,840 tagged clones from 24,824 open source Java projects for our study. We observe that our approach improves the performance significantly by 12% in terms of Fmeasure. Furthermore, our threshold-free approach eliminates the concern of practitioners about possible misconfiguration of Type-3 clone detection tools.

read more

An Empirical Study of the Effect of File Editing Patterns on Software Quality (extended)

2014-7-18

Developers might follow different file editing patterns when handling change requests. Existing research has warned the community about the potential negative impacts of some file editing patterns on software quality. However, very few studies have provided quantitative evidence to support these claims. In this paper, we propose four metrics to identify four file editing patterns: concurrent editing pattern, parallel editing pattern, extended editing pattern, and interrupted editing pattern. Our empirical study on three open source projects shows that 90% (i.e. 1935 out of 2140) of files exhibit at least one file editing pattern. More specifically (1) files that are edited concurrently by many developers are 1.8 times more likely to experience future bugs than files that are not concurrently edited; (2) files edited in parallel with too many other files by the same developer are 2.9 times more likely to exhibit future bugs than files individually edited; (3) files edited over an extended period of time are 1.9 times more likely to experience future bugs than other files; and (4) files edited with long interruptions have 2.0 times more future bugs than other files. We also observe that the likelihood of future bugs in files experiencing all the four file editing patterns is 3.9 times higher than in files that are never involved in any of the four patterns. We further investigate factors impacting the occurrence of these file editing patterns along three dimensions: the ownership of files, the type of change requests in which the files were involved, and the initial code quality of the files. Results show that a file with a major owner is 0.6 times less likely to exhibit the concurrent editing pattern than files without major owners. Files with bad code quality (e.g. highMcCabe’s complexity, high coupling between objects, and lack of cohesion) are more likely to experience the four editing patterns. By ensuring a clear ownership and improving code quality, the negative impact of the four patterns could be reduced. Overall, our findings could be used by software development teams to warn developers about risky file editing patterns.

read more

Towards Building a Universal Defect Prediction Model

2014-6-1

To predict files with defects, a suitable prediction model must be built for a software project from either itself (withinproject) or other projects (cross-project). A universal defect prediction model that is built from the entire set of diverse projects would relieve the need for building models for an individual project. A universal model could also be interpreted as a basic relationship between software metrics and defects. However, the variations in the distribution of predictors pose a formidable obstacle to build a universal model. Such variations exist among projects with different context factors (e.g., size and programming language). To overcome this challenge, we propose context-aware rank transformations for predictors. We cluster projects based on the similarity of the distribution of 26 predictors, and derive the rank transformations using quantiles of predictors for a cluster. We then fit the universal model on the transformed data of 1,398 open source projects hosted on SourceForge and GoogleCode. Adding context factors to the universal model improves the predictive power. The universal model obtains prediction performance comparable to the within-project models and yields similar results when applied on five external projects (one Apache and four Eclipse projects). These results suggest that a universal defect prediction model may be an achievable goal.

read more

How does Context affect the Distribution of Software Maintainability Metrics?

2013-9-25

Software metrics have many uses, e.g., defect prediction, effort estimation, and benchmarking an organization against peers and industry standards. In all these cases, metrics may depend on the context, such as the programming language. Here we aim to investigate if the distributions of commonly used metrics do, in fact, vary with six context factors: application domain, programming language, age, lifespan, the number of changes, and the number of downloads. For this preliminary study we select 320 nontrivial software systems from SourceForge. These software systems are randomly sampled from nine popular application domains of SourceForge. We calculate 39 metrics commonly used to assess software maintainability for each software system and use Kruskal Wallis test and Mann-Whitney U test to determine if there are significant differences among the distributions with respect to each of the six context factors. We use Cliff’s delta to measure the magnitude of the differences and find that all six context factors affect the distribution of 20 metrics and the programming language factor affects 35 metrics. We also briefly discuss how each context factor may affect the distribution of metric values.We expect our results to help software benchmarking and other software engineering methods that rely on these commonly used metrics to be tailored to a particular context.

read more

An Empirical Study of the Effect of File Editing Patterns on Software Quality

2012-10-18

While some developers like to work on multiple code change requests, others might prefer to handle one change request at a time. This juggling of change requests and the large number of developers working in parallel often lead to files being edited as part of different change requests by one or several developers. Existing research has warned the community about the potential negative impacts of some file editing patterns on software quality. For example, when several developers concurrently edit a file as part of different change requests, they are likely to introduce bugs due to limited awareness of other changes. However, very few studies have provided quantitative evidence to support these claims. In this paper, we identify four file editing patterns. We perform an empirical study on three open source software systems to investigate the individual and the combined impact of the four patterns on software quality. We find that: (1) files that are edited concurrently by many developers have on average 2.46 times more future bugs than files that are not concurrently edited; (2) files edited in parallel with other files by the same developer have on average 1.67 times more future bugs than files individually edited; (3) files edited over an extended period (i.e., above the third quartile) of time have 2.28 times more future bugs than other files; and (4) files edited with long interruptions (i.e., above the third quartile) have 2.1 times more future bugs than other files. When more than one editing patterns are followed by one or many developers during the editing of a file, we observe that the number of future bugs in the file can be as high as 1.6 times the average number of future bugs in files edited following a single editing pattern. These results can be used by software development teams to warn developers about risky file editing patterns.

read more

An Empirical Study on Factors Impacting Bug Fixing Time

2012-10-17

Fixing bugs is an important activity of the software development process. A typical process of bug fixing consists of the following steps: 1) a user files a bug report; 2) the bug is assigned to a developer; 3) the developer fixes the bug; 4) changed code is reviewed and verified; and 5) the bug is resolved. Many studies have investigated the process of bug fixing. However, to the best of our knowledge, none has explicitly analyzed the interval between bug assignment and the time when bug fixing starts. After a bug assignment, some developers will immediately start fixing the bug while others will start bug fixing after a long period. We are blind on developer’s delays when fixing bugs. This paper explores such delays of developers through an empirical study on three open source software systems. We examine factors affecting bug fixing time along three dimensions: bug reports, source code involved in the fix, and code changes that are required to fix the bug. We further compare different factors by descriptive logistic regression models. Our results can help development teams better understand factors behind delays, and then improve bug fixing process.

read more