Absolute Correlation Weighted Naïve Bayes for Software Defect Prediction

Rizky Tri Asmono, Romi Satria Wahono, Abdul Syukur


The maintenance phase of the software project can be very expensive for the developer team and harmful to the users because some flawed software modules. It can be avoided by detecting defects as early as possible. Software defect prediction will provide an opportunity for the developer team to test modules or files that have a high probability defect. Naïve Bayes has been used to predict software defects. However, Naive Bayes assumes all attributes are equally important and are not related each other while, in fact, this assumption is not true in many cases. Absolute value of correlation coefficient has been proposed as weighting method to overcome Naïve Bayes assumptions. In this study, Absolute Correlation Weighted Naïve Bayes have been proposed. The results of parametric test on experiment results show that the proposed method improve the performance of Naïve Bayes for classifying defect-prone on software defect prediction.

Full Text:



Arauzo-Azofra, A., Aznarte, J. L., & Benítez, J. M. (2011). Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Systems with Applications, 38(7), 8170–8177. doi:10.1016/j.eswa.2010.12.160

Arisholm, E., Briand, L. C., & Fuglerud, M. (2007). Data Mining Techniques for Building Fault-proneness Models in Telecom Java Software. In The 18th IEEE International Symposium on Software Reliability (ISSRE ’07) (pp. 215–224). IEEE. doi:10.1109/ISSRE.2007.22

Arisholm, E., Briand, L. C., & Johannessen, E. B. (2010). A systematic and comprehensive investigation of methods to build and evaluate fault prediction models. Journal of Systems and Software, 83(1), 2–17. doi:10.1016/j.jss.2009.06.055

Bibi, S., Tsoumakas, G., Stamelos, I., & Vlahvas, I. (2006). Software Defect Prediction Using Regression via Classification. In IEEE International Conference on Computer Systems and Applications, 2006. (pp. 330–336). IEEE. doi:10.1109/AICCSA.2006.205110

Brown, I., & Mues, C. (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39(3), 3446–3453. doi:10.1016/j.eswa.2011.09.033

Freund, R. J., & Wilson, W. J. (2003). Statistical Methods (2nd ed.). Academic Press.

Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29(2-3), 131–163.

Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D. W., Schummer, M., & Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 16(10), 906–914. doi:10.1093/bioinformatics/16.10.906

Golub, T. R. (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286(5439), 531–537. doi:10.1126/science.286.5439.531

Gray, D., Bowes, D., Davey, N., Sun, Y., & Christianson, B. (2012). Reflections on the NASA MDP data sets. IET Software, 6(6), 549. doi:10.1049/iet-sen.2011.0132

Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1-3), 389–422. doi:10.1023/A:1012487302797

Hall, M. (2007). A decision tree-based attribute weighting filter for naive Bayes. Knowledge-Based Systems, 20(2), 120–126. doi:10.1016/j.knosys.2006.11.008

Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304. doi:10.1109/TSE.2011.103

Harrington, P. (2012). Machine Learning in Action. Connecticut: Manning Publications Co. Greenwich, CT, USA.

Hsu, C., Chang, C., & Lin, C. (2003). A Practical Guide to Support Vector Classification. Department of Computer Science and Information Engineering, National Taiwai University.

Jiang, L. (2011). Random one-dependence estimators. Pattern Recognition Letters, 32(3), 532–539. doi:10.1016/j.patrec.2010.11.016

Jiang, L., Cai, Z., & Wang, D. (2010). Improving Naive Bayes for Classification. International Journal of Computers and Applications, 32(3). doi:10.2316/Journal.202.2010.3.202-2747

Jiang, L., Wang, D., Cai, Z., & Yan, X. (2007). Survey of improving naive Bayes for classification. Advanced Data Mining and Applications. doi:10.1007/978-3-540-73871-8_14

Khoshgoftaar, T. M., & Seliya, N. (2004). Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study. Empirical Software Engineering, 9(3), 229–257. doi:10.1023/B:EMSE.0000027781.18360.9b

Khoshgoftaar, T. M., Yuan, X., Allen, E. B., Jones, W. D., & Hudepohl, J. P. (2002). Uncertain Classification of Fault-Prone Software Modules. Empirical Software Engineering, 7(4), 297–318. doi:10.1023/A:1020511004267

Lessmann, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4), 485–496. doi:10.1109/TSE.2008.35

Lin, J., & Yu, J. (2011). Weighted Naive Bayes classification algorithm based on particle swarm optimization. In 2011 IEEE 3rd International Conference on Communication Software and Networks (pp. 444–447). IEEE. doi:10.1109/ICCSN.2011.6014307

Mizuno, O., & Kikuno, T. (2007). Training on errors experiment to detect fault-prone software modules by spam filter. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering - ESEC-FSE ’07 (p. 405). New York, New York, USA: ACM Press. doi:10.1145/1287624.1287683

NASA-SoftwareDefectDataSets. (n.d.). Retrieved from http://nasa-softwaredefectdatasets.wikispaces.com/

Okutan, A., & Yıldız, O. T. (2012). Software defect prediction using Bayesian networks. Empirical Software Engineering, 19(1), 154–181. doi:10.1007/s10664-012-9218-8

Pavlidis, P., Weston, J., Cai, J., & Grundy, W. N. (2001). Gene functional classification from heterogeneous data. In Proceedings of the fifth annual international conference on Computational biology - RECOMB ’01 (pp. 249–255). New York, New York, USA: ACM Press. doi:10.1145/369133.369228

Ratanamahatana, C., & Gunopulos, D. (2003). Feature selection for the naive bayesian classifier using decision trees. Applied Artificial Intelligence, 475–487. doi:10.1080/08839510390219327

Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2013). Data Quality: Some Comments on the NASA Software Defect Datasets. IEEE Transactions on Software Engineering, 39(9), 1208–1215. doi:10.1109/TSE.2013.11

Song, Q., Jia, Z., Shepperd, M., Ying, S., & Liu, J. (2011). A General Software Defect-Proneness Prediction Framework. IEEE Transactions on Software Engineering, 37(3), 356–370. doi:10.1109/TSE.2010.90

Taheri, S., Yearwood, J., Mammadov, M., & Seifollahi, S. (2013). Attribute weighted Naive Bayes classifier using a local optimization. Neural Computing and Applications. doi:10.1007/s00521-012-1329-z

Turhan, B., & Bener, A. (2009). Analysis of Naive Bayes’ assumptions on software fault data: An empirical study. Data & Knowledge Engineering, 68(2), 278–290. doi:10.1016/j.datak.2008.10.005

Webb, G. I., Boughton, J. R., & Wang, Z. (2005). Not So Naive Bayes: Aggregating One-Dependence Estimators. Machine Learning, 58(1), 5–24. doi:10.1007/s10994-005-4258-6

Webb, G. I., Boughton, J. R., Zheng, F., Ting, K. M., & Salem, H. (2011). Learning by extrapolation from marginal to full-multivariate probability distributions: decreasingly naive Bayesian classification. Machine Learning, 86(2), 233–272. doi:10.1007/s10994-011-5263-6

Wu, J., & Cai, Z. (2011). Attribute weighting via differential evolution algorithm for attribute weighted naive bayes (WNB). Journal of Computational Information Systems, 7(12), 1672–1679.

Wu, X., & Kumar, V. (2009). The top ten algorithms in data mining. International Statistical Review (Vol. 78, pp. 158–158). Taylor & Francis Group.

Zaidi, N., Cerquides, J., Carman, M., & Webb, G. (2013). Alleviating Naive Bayes Attribute Independence Assumption by Attribute Weighting. Journal of Machine Learning Research, 14, 1947–1988.

Zhang, H. (2004). Learning Weighted Naive Bayes with Accurate Ranking. In Fourth IEEE International Conference on Data Mining (ICDM’04) (pp. 567–570). IEEE. doi:10.1109/ICDM.2004.10030

Journal of Software Engineering (JSE, ISSN 2356-3974)
Copyright © 2015 IlmuKomputer.Com. All rights reserved.