Penerapan Naive Bayes untuk Mengurangi Data Noise pada Klasifikasi Multi Kelas dengan Decision Tree

Al Riza Khadafy, Romi Satria Wahono

Abstract


Selama beberapa dekade terakhir, cukup banyak algoritma data mining yang telah diusulkan oleh peneliti kecerdasan komputasi untuk memecahkan masalah klasifikasi di dunia nyata. Di antara metode-metode data mining lainnya, Decision Tree (DT) memiliki berbagai keunggulan diantaranya sederhana untuk dipahami, mudah untuk diterapkan, membutuhkan sedikit pengetahuan, mampu menangani data numerik dan kategorikal, tangguh, dan dapat menangani dataset yang besar. Banyak dataset berukuran besar dan memiliki banyak kelas atau multi kelas yang ada di dunia memiliki noise atau mengandung error. Algoritma pengklasifikasi DT memiliki keunggulan dalam menyelesaikan masalah klasifikasi, namun data noise yang terdapat pada dataset berukuran besar dan memiliki banyak kelas atau multi kelas dapat mengurangi akurasi pada klasifikasinya. Masalah data noise pada dataset tersebut akan diselesaikan dengan menerapkan pengklasifikasi Naive Bayes (NB) untuk menemukan instance yang mengandung noise dan menghapusnya sebelum diproses oleh pengklasifikasi DT. Pengujian metode yang diusulkan dilakukan dengan delapan dataset uji dari UCI (University of California, Irvine) machine learning repository dan dibandingkan dengan algoritma pengklasifikasi DT. Hasil akurasi yang didapat menunjukkan bahwa algoritma yang diusulkan DT+NB lebih unggul dari algoritma DT, dengan nilai akurasi untuk masing-masing dataset uji seperti Breast Cancer 96.59% (meningkat 21,06%), Diabetes 92,32% (meningkat 18,49%), Glass 87,50% (meningkat 20,68%), Iris 97,22% (meningkat 1,22%), Soybean 95,28% (meningkat 3,77%), Vote 98,98% (meningkat 2,66%), Image Segmentation 99,10% (meningkat 3,36%), dan Tic-tac-toe 93,85% (meningkat 9,30%). Dengan demikian dapat disimpulkan bahwa penerapan NB terbukti dapat menangani data noise pada dataset berukuran besar dan memiliki banyak kelas atau multi kelas sehingga akurasi pada algoritma klasifikasi DT meningkat.

Full Text:

PDF

References


Aggarwal, C. C. (2015). Data Mining, The Textbook. Springer Berlin Heidelberg.

Aitkenhead, M. J. (2008). A co-evolving decision tree classification method. Expert Systems with Applications, 34(1), 18–25.

Aviad, B., & Roy, G. (2011). Classification by clustering decision tree-like classifier based on adjusted clusters. Expert Systems with Applications, 38(7), 8220–8228.

Balamurugan, S. A. A., & Rajaram, R. (2009). Effective solution for unhandled exception in decision tree induction algorithms. Expert Systems with Applications, 36(10), 12113–12119.

Berndtsson, M., Hansson, J., Olsson, B., & Lundell, B. (2008). Thesis Guide - A Guide for Students in Computer Science and Information Systems (2nd ed). Springer-Verlag.

Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees. Chapman and Hall/CRC (1st ed., Vol. 19). Chapman and Hall/CRC.

Bujlow, T., Riaz, T., & Pedersen, J. M. (2012). A method for classification of network traffic based on C5.0 machine learning algorithm. 2012 International Conference on Computing, Networking and Communications, ICNC’12, 237–241.

Chandra, B., & Paul Varghese, P. (2009). Fuzzifying Gini Index based decision trees. Expert Systems with Applications, 36(4), 8549–8559.

Chen, Y. L., & Hung, L. T. H. (2009). Using decision trees to summarize associative classification rules. Expert Systems with Applications, 36, 2338–2351.

Dawson, C. W. (2009). Projects in Computing and Information Systems A Student’s Guide (2nd ed). Great Britain: Pearson Education.

Demsar, J. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research, 7, 1–30.

Farid, D. M., & Rahman, M. Z. (2010). Anomaly network intrusion detection based on improved self adaptive Bayesian algorithm. Journal of Computers, 5(1), 23–31.

Farid, D. M., Rahman, M. Z., & Rahman, C. M. (2011). Adaptive Intrusion Detection based on Boosting and Naive Bayesian Classifier. International Journal of Computer Applications, 24(3), 12–19.

Farid, D. M., Zhang, L., Hossain, A., Rahman, C. M., Strachan, R., Sexton, G., & Dahal, K. (2013). An adaptive ensemble classifier for mining concept drifting data streams. Expert Systems with Applications, 40(15), 5895–5906.

Franco-Arcega, A., Carrasco-Ochoa, J. a., Sanchez-Diaz, G., & Martinez-Trinidad, J. F. (2011). Decision tree induction using a fast splitting attribute selection for large datasets. Expert Systems with Applications, 38(11), 14290–14300.

Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques.

Jamain, A., & Hand, D. J. (2008). Mining Supervised Classification Performance Studies: A Meta-Analytic Investigation. Journal of Classification, 25(1), 87–112.

Larose Daniel T. (2005). Discovering Knowledge in Data: An Introduction to Data Mining. Wiley Interscience.

Lee, L. H., & Isa, D. (2010). Automatically computed document dependent weighting factor facility for Naïve Bayes classification. Expert Systems with Applications, 37(12), 8471–8478.

Liao, S. H., Chu, P. H., & Hsiao, P. Y. (2012). Data mining techniques and applications - A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303–11311.

Loh, W.-Y., & Shih, Y.-S. (1997). Split Selection Methods for Classification Trees. Statistica Sinica, 7(4), 815–840.

Maimon, O., & Rokach, L. (2010). Data Mining and Knowledge Discovery Handbook. Data Mining and Knowledge Discovery Handbook (2nd ed.). New York: Springer-Verlag.

Ngai, E. W. T., Xiu, L., & Chau, D. C. K. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36, 2592–2602.

Polat, K., & Gunes, S. (2009). A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems. Expert Systems with Applications, 36, 1587–1592.

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. California: Morgan Kaufmann.

Safavian, S. R., & Landgrebe, D. (1991). A Survey of Decision Tree Classifier Methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3).

Turney, P. (1995). Cost-Sensitive Classification : Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm. Journal of Artificial Intelligence Research, 2, 369–409.

Utgoff, P. E. (1989). Incremental Induction of Decision Trees. Machine Learning, 4(2), 161–186.

Witten, I. H., Eibe, F., & Hall, M. A. (2011). Data mining : practical machine learning tools and techniques.—3rd ed. Morgan Kaufmann (3rd ed.). Morgan Kaufmann.


Refbacks

  • There are currently no refbacks.




Journal of Intelligent Systems (JIS, ISSN 2356-3982)
Copyright © 2015 IlmuKomputer.Com. All rights reserved.