A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks
Abstract
Full Text:
PDFReferences
Arisholm, E., Briand, L. C., & Fuglerud, M. (2007). Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software. Proceedings of the The 18th IEEE International Symposium on Software Reliability, 215–224. http://doi.org/10.1109/ISSRE.2007.22
Arisholm, E., Briand, L. C., & Johannessen, E. B. (2010). A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 83(1), 2–17. http://doi.org/10.1016/j.jss.2009.06.055
Azar, D., & Vybihal, J. (2011). An ant colony optimization algorithm to improve software quality prediction models: Case of class stability. Information and Software Technology, 53(4), 388–393. http://doi.org/10.1016/j.infsof.2010.11.013
Benaddy, M., & Wakrim, M. (2012). Simulated Annealing Neural Network for Software Failure Prediction. International Journal of Software Engineering and Its Applications, 6(4).
Bibi, S., Tsoumakas, G., Stamelos, I., & Vlahavas, I. (2008). Regression via Classification applied on software defect estimation. Expert Systems with Applications, 34(3), 2091–2101. http://doi.org/10.1016/j.eswa.2007.02.012
Bishnu, P. S., & Bhattacherjee, V. (2012). Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, 24(6), 1146–1150. http://doi.org/10.1109/TKDE.2011.163
Boehm, B., & Basili, V. R. (2001). Top 10 list [software development]. Computer, 34(1), 135–137.
Buzan, T., & Griffiths, C. (2013). Mind Maps for Business: Using the ultimate thinking tool to revolutionise how you work (2nd Edition). FT Press.
Cano, J. R., Herrera, F., & Lozano, M. (2003). Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Transactions on Evolutionary Computation, 7(6), 561–575.
Cao, H., Qin, Z., & Feng, T. (2012). A Novel PCA-BP Fuzzy Neural Network Model for Software Defect Prediction. Advanced Science Letters, 9(1), 423–428.
Catal, C. (2011). Software fault prediction: A literature review and current trends. Expert Systems with Applications, 38(4), 4626–4636.
Catal, C., Alan, O., & Balkan, K. (2011). Class noise detection based on software metrics and ROC curves. Information Sciences, 181(21), 4867–4877.
Catal, C., & Diri, B. (2009a). A systematic review of software fault prediction studies. Expert Systems with Applications, 36(4), 7346–7354.
Catal, C., & Diri, B. (2009b). Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences, 179(8), 1040–1058. http://doi.org/10.1016/j.ins.2008.12.001
Catal, C., Sevim, U., & Diri, B. (2011). Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm. Expert Systems with Applications, 38(3), 2347–2353. http://doi.org/10.1016/j.eswa.2010.08.022
Challagulla, V., Bastani, F., & Yen, I. (2006). A Unified Framework for Defect Data Analysis Using the MBR Technique. 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’06), 39–46. http://doi.org/10.1109/ICTAI.2006.23
Challagulla, V. U. B., Bastani, F. B., & Paul, R. A. (2004). Empirical Assessment of Machine Learning based Software Defect Prediction Techniques. In 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (pp. 263–270). IEEE. http://doi.org/10.1109/WORDS.2005.32
Chang, C.-P., Chu, C.-P., & Yeh, Y.-F. (2009). Integrating in-process software defect prediction with association mining to discover defect pattern. Information and Software Technology, 51(2), 375–384. http://doi.org/10.1016/j.infsof.2008.04.008
Chang, R. H., Mu, X. D., & Zhang, L. (2011). Software Defect Prediction Using Non-Negative Matrix Factorization. Journal of Software, 6(11), 2114–2120. http://doi.org/10.4304/jsw.6.11.2114-2120
Cukic, B., & Singh, H. (2004). Robust Prediction of Fault-Proneness by Random Forests. 15th International Symposium on Software Reliability Engineering, 417–428. http://doi.org/10.1109/ISSRE.2004.35
Dejaeger, K., Verbraken, T., & Baesens, B. (2013). Toward Comprehensible Software Fault Prediction Models Using Bayesian Network Classifiers. IEEE Transactions on Software Engineering, 39(2), 237–257. http://doi.org/10.1109/TSE.2012.20
Denaro, G. (2000). Estimating software fault-proneness for tuning testing activities. In Proceedings of the 22nd International Conference on Software engineering - ICSE ’00 (pp. 704–706). New York, New York, USA: ACM Press.
El Emam, K., & Laitenberger, O. (2001). Evaluating capture-recapture models with two inspectors. IEEE Transactions on Software Engineering, 27(9), 851–864. http://doi.org/10.1109/32.950319
El Emam, K., Melo, W., & Machado, J. C. (2001). The prediction of faulty classes using object-oriented design metrics. Journal of Systems and Software, 56(1), 63–75. http://doi.org/10.1016/S0164-1212(00)00086-8
Elish, K. O., & Elish, M. O. (2008). Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5), 649–660. http://doi.org/10.1016/j.jss.2007.07.040
Fenton, N. E., & Neil, M. (1999). A critique of software defect prediction models. IEEE Transactions on Software Engineering, 25(5), 675–689. http://doi.org/10.1109/32.815326
Fenton, N., Krause, P., & Neil, M. (2001). A Probabilistic Model for Software Defect Prediction. IEEE Transactions on Software Engineering, 44(0), 1–35.
Fenton, N., Neil, M., Marsh, W., Hearty, P., Marquez, D., Krause, P., & Mishra, R. (2007). Predicting software defects in varying development lifecycles using Bayesian nets. Information and Software Technology, 49(1), 32–43. http://doi.org/10.1016/j.infsof.2006.09.001
Gayatri, N., Reddy, S., & Nickolas, A. V. (2010). Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions. Lecture Notes in Engineering and Computer Science, 2186(1), 124–129.
Gondra, I. (2008). Applying machine learning to software fault-proneness prediction. Journal of Systems and Software, 81(2), 186–195. http://doi.org/10.1016/j.jss.2007.05.035
Gray, D., Bowes, D., Davey, N., & Christianson, B. (2011). The misuse of the NASA Metrics Data Program data sets for automated software defect prediction. 15th Annual Conference on Evaluation & Assessment in Software Engineering (EASE 2011), 96–103.
Gray, D., Bowes, D., Davey, N., Sun, Y., & Christianson, B. (2012). Reflections on the NASA MDP data sets. IET Software, 6(6), 549.
Güneş Koru, a., & Liu, H. (2007). Identifying and characterizing change-prone classes in two large-scale open-source products. Journal of Systems and Software, 80(1), 63–73. http://doi.org/10.1016/j.jss.2006.05.017
Güneş Koru, A., & Tian, J. (2003). An empirical comparison and characterization of high defect and high complexity modules. Journal of Systems and Software, 67(3), 153–163. http://doi.org/10.1016/S0164-1212(02)00126-7
Guo, L., Cukic, B., & Singh, H. (2003). Predicting fault prone modules by the Dempster-Shafer belief networks. In Proceedings of the 18th IEEE International Conference on Automated Software Engineering, 2003 (pp. 249–252). IEEE Comput. Soc. http://doi.org/10.1109/ASE.2003.1240314
Guo, X. C., Yang, J. H., Wu, C. G., Wang, C. Y., & Liang, Y. C. (2008). A novel LS-SVMs hyper-parameter selection based on particle swarm optimization. Neurocomputing, 71(16-18), 3211–3215. http://doi.org/10.1016/j.neucom.2008.04.027
Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering, 15(6), 1437–1447.
Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.
IEEE. (1990). IEEE Standard Glossary of Software Engineering Terminology (Vol. 121990). Inst. of Electrical and Electronical Engineers.
J. Pai, G., & Bechta Dugan, J. (2007). Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods. IEEE Transactions on Software Engineering, 33(10), 675–686. http://doi.org/10.1109/TSE.2007.70722
Jiang, Y., Li, M., Zhou, Z., & Member, S. (2011). Software Defect Detection with rocus. Journal of Computer Science and Technology, 26(2), 328–342. http://doi.org/10.1007/s11390-011-1135-6
Jin, C., Jin, S.-W., & Ye, J.-M. (2012). Artificial neural network-based metric selection for software fault-prone prediction model. IET Software, 6(6), 479. http://doi.org/10.1049/iet-sen.2011.0138
Jones, C., & Bonsignour, O. (2012). The Economics of Software Quality. Pearson Education, Inc.
Jorgensen, M., & Shepperd, M. (2007). A Systematic Review of Software Development Cost Estimation Studies. IEEE Transactions on Software Engineering, 33(1).
Kanmani, S., Uthariaraj, V. R., Sankaranarayanan, V., & Thambidurai, P. (2004). Object oriented software quality prediction using general regression neural networks. ACM SIGSOFT Software Engineering Notes, 29(5), 1. http://doi.org/10.1145/1022494.1022515
Karthik, R., & Manikandan, N. (2010). Defect association and complexity prediction by mining association and clustering rules. 2010 2nd International Conference on Computer Engineering and Technology, V7–569–V7–573. http://doi.org/10.1109/ICCET.2010.5485608
Kenny, G. Q. (1993). Estimating defects in commercial software during operational use. IEEE Transactions on Reliability, 42(1), 107–115.
Khoshgoftaar, T. M., & Allen, E. B. (2000). Prediction of software faults using fuzzy nonlinear regression modeling. Proceedings. Fifth IEEE International Symposium on High Assurance Systems Engineering (HASE 2000), 281–290. http://doi.org/10.1109/HASE.2000.895473
Khoshgoftaar, T. M., Allen, E. B., Hudepohl, J. P., & Aud, S. J. (1997). Application of neural networks to software quality modeling of a very large telecommunications system. IEEE Transactions on Neural Networks / a Publication of the IEEE Neural Networks Council, 8(4), 902–9. http://doi.org/10.1109/72.595888
Khoshgoftaar, T. M., Allen, E. B., Jones, W. D., & Hudepohl, J. P. (2000). Classification-tree models of software-quality over multiple releases. IEEE Transactions on Reliability, 49(1), 4–11. http://doi.org/10.1109/24.855532
Khoshgoftaar, T. M., & Gao, K. (2009). Feature Selection with Imbalanced Data for Software Defect Prediction. 2009 International Conference on Machine Learning and Applications, 235–240. http://doi.org/10.1109/ICMLA.2009.18
Khoshgoftaar, T. M., & Seliya, N. (2002). Tree-based software quality estimation models for fault prediction. Proceedings Eighth IEEE Symposium on Software Metrics, 203–214. http://doi.org/10.1109/METRIC.2002.1011339
Khoshgoftaar, T. M., Seliya, N., & Gao, K. (2005). Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study. Empirical Software Engineering, 10(2), 183–218.
Khoshgoftaar, T. M., Seliya, N., & Sundaresh, N. (2006). An empirical study of predicting software faults with case-based reasoning. Software Quality Journal, 14(2), 85–111. http://doi.org/10.1007/s11219-006-7597-z
Khoshgoftaar, T. M., & Van Hulse, J. (2009). Empirical Case Studies in Attribute Noise Detection. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39(4), 379–388.
Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2011). Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 41(3), 552–568.
Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. EBSE Technical Report Version 2.3, EBSE-2007-.
Koru, A. G., & Liu, H. (2005). An investigation of the effect of module size on defect prediction using static measures. In Proceedings of the 2005 workshop on Predictor models in software engineering - PROMISE ’05 (Vol. 30, pp. 1–5). New York, New York, USA: ACM Press. http://doi.org/10.1145/1082983.1083172
Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4), 485–496.
Li, Z., & Reformat, M. (2007). A practical method for the software fault-prediction. In 2007 IEEE International Conference on Information Reuse and Integration (pp. 659–666). IEEE. http://doi.org/10.1109/IRI.2007.4296695
Lin, S.-W., Ying, K.-C., Chen, S.-C., & Lee, Z.-J. (2008). Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Systems with Applications, 35(4), 1817–1824. http://doi.org/10.1016/j.eswa.2007.08.088
Liu, Y., Khoshgoftaar, T. M., & Seliya, N. (2010). Evolutionary Optimization of Software Quality Modeling with Multiple Repositories. IEEE Transactions on Software Engineering, 36(6), 852–864.
Lyu, M. R. (2000). Software quality prediction using mixture models with EM algorithm. In Proceedings First Asia-Pacific Conference on Quality Software (pp. 69–78). IEEE Comput. Soc. http://doi.org/10.1109/APAQ.2000.883780
Ma, Y., Guo, L., & Cukic, B. (2007). A Statistical Framework for the Prediction of Fault-Proneness. In Advances in Machine Learning Applications in Software Engineering (pp. 1–26).
Ma, Y., Luo, G., Zeng, X., & Chen, A. (2012). Transfer learning for cross-company software defect prediction. Information and Software Technology, 54(3), 248–256. http://doi.org/10.1016/j.infsof.2011.09.007
Maimon, O., & Rokach, L. (2010). Data Mining and Knolwedge Discovery Handbook Second Edition. Springer.
McDonald, M., Musson, R., & Smith, R. (2007). The practical guide to defect prevention. Control, 260–272.
Mende, T., & Koschke, R. (2009). Revisiting the evaluation of defect prediction models. Proceedings of the 5th International Conference on Predictor Models in Software Engineering - PROMISE ’09, 1. http://doi.org/10.1145/1540438.1540448
Menzies, T., DiStefano, J., Orrego, A. S., & Chapman, R. (2004). Assessing predictors of software defects. In Proceedings of the Workshop on Predictive Software Models.
Menzies, T., Greenwald, J., & Frank, A. (2007). Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.
Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., & Bener, A. (2010). Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering, 17(4), 375–407.
Mısırlı, A. T., Bener, A. B., & Turhan, B. (2011). An industrial case study of classifier ensembles for locating software defects. Software Quality Journal, 19(3), 515–536. http://doi.org/10.1007/s11219-010-9128-1
Myrtveit, I., Stensrud, E., & Shepperd, M. (2005). Reliability and validity in comparative studies of software prediction models. IEEE Transactions on Software Engineering, 31(5), 380–391. http://doi.org/10.1109/TSE.2005.58
Naik, K., & Tripathy, P. (2008). Software Testing and Quality Assurance. John Wiley & Sons, Inc.
Ostrand, T. J., Weyuker, E. J., & Bell, R. M. (2005). Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31(4), 340–355. http://doi.org/10.1109/TSE.2005.49
Park, B., Oh, S., & Pedrycz, W. (2013). The design of polynomial function-based neural network predictors for detection of software defects. Information Sciences, 229, 40–57.
Pelayo, L., & Dick, S. (2012). Evaluating Stratification Alternatives to Improve Software Defect Prediction. IEEE Transactions on Reliability, 61(2), 516–525. http://doi.org/10.1109/TR.2012.2183912
Peng, J., & Wang, S. (2010). Parameter Selection of Support Vector Machine based on Chaotic Particle Swarm Optimization Algorithm. Electrical Engineering, 3271–3274.
Peng, Y., Wang, G., & Wang, H. (2012). User preferences based software defect detection algorithms selection using MCDM. Information Sciences, 191, 3–13. http://doi.org/10.1016/j.ins.2010.04.019
Peters, F., Menzies, T., Gong, L., & Zhang, H. (2013). Balancing Privacy and Utility in Cross-Company Defect Prediction. IEEE Transactions on Software Engineering, 39(8), 1054–1068. http://doi.org/10.1109/TSE.2013.6
Pizzi, N. J., Summers, A. R., & Pedrycz, W. (2002). Software quality prediction using median-adjusted class labels. Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02 (Cat. No.02CH37290), (1), 2405–2409. http://doi.org/10.1109/IJCNN.2002.1007518
Quah, T., Mie, M., Thwin, T., & Quah, T. (2003). Application of neural networks for software quality prediction using object-oriented metrics. International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings. IEEE Comput. Soc.
Radjenović, D., Heričko, M., Torkar, R., & Živkovič, A. (2013, August). Software fault prediction metrics: A systematic literature review. Information and Software Technology. http://doi.org/10.1016/j.infsof.2013.02.009
Sammut, C., & Webb, G. I. (2011). Encyclopedia of Machine Learning. Springer.
Sandhu, P. S., Kumar, S., & Singh, H. (2007). Intelligence System for Software Maintenance Severity Prediction. Journal of Computer Science, 3(5), 281–288. http://doi.org/10.3844/jcssp.2007.281.288
Seiffert, C., Khoshgoftaar, T. M., & Van Hulse, J. (2009). Improving Software-Quality Predictions With Data Sampling and Boosting. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 39(6), 1283–1294.
Seliya, N., & Khoshgoftaar, T. M. (2007). Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 37(2), 201–211. http://doi.org/10.1109/TSMCA.2006.889473
Shepperd, M., Cartwright, M., & Mair, C. (2006). Software defect association mining and defect correction effort prediction. IEEE Transactions on Software Engineering, 32(2), 69–82. http://doi.org/10.1109/TSE.2006.1599417
Shepperd, M., & Kadoda, G. (2001). Comparing software prediction techniques using simulation. IEEE Transactions on Software Engineering, 27(11), 1014–1022. http://doi.org/10.1109/32.965341
Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2013). Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Transactions on Software Engineering, 39(9), 1208–1215. http://doi.org/10.1109/TSE.2013.11
Song, Q., Jia, Z., Shepperd, M., Ying, S., & Liu, J. (2011). A General Software Defect-Proneness Prediction Framework. IEEE Transactions on Software Engineering, 37(3), 356–370.
Sun, Z., Song, Q., & Zhu, X. (2012). Using Coding-Based Ensemble Learning to Improve Software Defect Prediction. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1806–1817. http://doi.org/10.1109/TSMCC.2012.2226152
Tosun, A., Turhan, B., & Bener, A. (2008). Ensemble of software defect predictors. In Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement - ESEM ’08 (p. 318). New York, New York, USA: ACM Press. http://doi.org/10.1145/1414004.1414066
Turhan, B., Kocak, G., & Bener, A. (2009). Data mining source code for locating software bugs: A case study in telecommunication industry. Expert Systems with Applications, 36(6), 9986–9990. http://doi.org/10.1016/j.eswa.2008.12.028
Turhan, B., Menzies, T., Bener, A. B., & Di Stefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540–578. http://doi.org/10.1007/s10664-008-9103-7
Unterkalmsteiner, M., Gorschek, T., Islam, A. K. M. M. K. M. M., Cheng, C. K., Permadi, R. B., & Feldt, R. (2012). Evaluation and Measurement of Software Process Improvement—A Systematic Literature Review. IEEE Transactions on Software Engineering, 38(2), 398–424. http://doi.org/10.1109/TSE.2011.26
Vandecruys, O., Martens, D., Baesens, B., Mues, C., De Backer, M., & Haesen, R. (2008). Mining software repositories for comprehensible software fault prediction models. Journal of Systems and Software, 81(5), 823–839. http://doi.org/10.1016/j.jss.2007.07.034
Wang, H., Khoshgoftaar, T. M., & Napolitano, A. (2010). A Comparative Study of Ensemble Feature Selection Techniques for Software Defect Prediction. 2010 Ninth International Conference on Machine Learning and Applications, 135–140.
Wang, Q., & Yu, B. (2004). Extract rules from software quality prediction model based on neural network. 16th IEEE International Conference on Tools with Artificial Intelligence, (Ictai), 191–195. http://doi.org/10.1109/ICTAI.2004.62
Wang, S., & Yao, X. (2013). Using Class Imbalance Learning for Software Defect Prediction. IEEE Transactions on Reliability, 62(2), 434–443.
Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining Third Edition. Elsevier Inc.
Wong, W. E., Debroy, V., Golden, R., Xu, X., & Thuraisingham, B. (2012). Effective Software Fault Localization Using an RBF Neural Network. IEEE Transactions on Reliability, 61(1), 149–169. http://doi.org/10.1109/TR.2011.2172031
Xing, F., Guo, P., & Lyu, M. R. (2005). A Novel Method for Early Software Quality Prediction Based on Support Vector Machine. 16th IEEE International Symposium on Software Reliability Engineering (ISSRE’05), 213–222. http://doi.org/10.1109/ISSRE.2005.6
Zhang, P., & Chang, Y. (2012). Software fault prediction based on grey neural network. In 2012 8th International Conference on Natural Computation (pp. 466–469). IEEE. http://doi.org/10.1109/ICNC.2012.6234505
Zheng, J. (2010). Cost-sensitive boosting neural networks for software defect prediction. Expert Systems with Applications, 37(6), 4537–4543.
Zhou, Y., & Leung, H. (2006). Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults. IEEE Transactions on Software Engineering, 32(10), 771–789. http://doi.org/10.1109/TSE.2006.102
Journal of Software Engineering(JSE, ISSN 2356-3974) Copyright 2020IlmuKomputer.Com. All rights reserved. |