Integrasi Metode Sample Bootstrapping dan Weighted Principal Component Analysis untuk Meningkatkan Performa k Nearest Neighbor pada Dataset Besar

Tri Agus Setiawan, Romi Satria Wahono, Abdul Syukur


Abstract: Algoritma k Nearest Neighbor (kNN) merupakan metode untuk melakukan klasifikasi terhadap objek baru berdasarkan k tetangga terdekatnya. Algoritma kNN memiliki kelebihan karena sederhana, efektif dan telah banyak digunakan pada banyak masalah klasifikasi. Namun algoritma kNN memiliki kelemahan jika digunakan pada dataset yang besar karena membutuhkan waktu komputasi cukup tinggi. Pada penelitian ini integrasi metode Sample Bootstrapping dan Weighted Principal Component Analysis (PCA) diusulkan untuk meningkatkan akurasi dan waktu komputasi yang optimal pada algoritma kNN. Metode Sample Bootstrapping digunakan untuk mengurangi jumlah data training yang akan diproses. Metode Weighted PCA digunakan dalam mengurangi atribut. Dalam penelitian ini menggunakan dataset yang memiliki dataset training yang besar yaitu Landsat Satellite sebesar 4435 data dan Tyroid sebesar 3772 data. Dari hasil penelitian, integrasi algoritma kNN dengan Sample Bootstrapping dan Weighted PCA pada dataset Landsat Satellite akurasinya meningkat 0.77% (91.40%-90.63%) dengan selisih waktu 9 (1-10) detik dibandingkan algoritma kNN standar. Integrasi algoritma kNN dengan Sample Bootstrapping dan Weighted PCA pada dataset Thyroid akurasinya meningkat 3.10% (89.31%-86.21%) dengan selisih waktu 11 (1-12) detik dibandingkan algoritma kNN standar. Dari hasil penelitian yang dilakukan, dapat disimpulkan bahwa integrasi algoritma kNN dengan Sample Bootstrapping dan Weighted PCA menghasilkan akurasi dan waktu komputasi yang lebih baik daripada algoritma kNN standar.


Keywords: algoritma kNN, Sample Bootstrapping, Weighted PCA

Full Text:



Amores, J. (2006). Boosting the distance estimation Application to the K -Nearest Neighbor Classifier. Pattern Recognition Letters, 27(d), 201–209. doi:10.1016/j.patrec.2005.08.019

Champagne, C., Mcnairn, H., Daneshfar, B., & Shang, J. (2014). A bootstrap method for assessing classification accuracy and confidence for agricultural land use mapping in Canada. International Journal of Applied Earth Observations and Geoinformation, 29, 44–52. doi:10.1016/j.jag.2013.12.016

Chen, X., & Samson, E. (2015). Environmental assessment of trout farming in France by life cycle assessment : using bootstrapped principal component analysis to better de fi ne system classification. Journal of Cleaner Production, 87, 87–95. doi:10.1016/j.jclepro.2014.09.021

Dudani, S. a. (1976). The Distance-Weighted k-Nearest-Neighbor Rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(4), 325–327. doi:10.1109/TSMC.1976.5408784

Fayed, H. A., & Atiya, A. F. (2009). A Novel Template Reduction Approach for the -Nearest Neighbor Method. IEEE Transactions on Neural Networks / a Publication of the IEEE Neural Networks Council, 20(5), 890–896.

Ghaderyan, P., Abbasi, A., & Hossein, M. (2014). An efficient seizure prediction method using KNN-based undersampling and linear frequency measures. Journal of Neuroscience Methods, 232, 134–142. doi:10.1016/j.jneumeth.2014.05.019

Han, J., & Kamber, M. (2012). Data Mining Concepts and Techniques. (M.

Han, J., & Kamber, Ed.) (Third Edit.). USA: Morgan Kaufmann Publishers.

Kim, S. B., & Rattakorn, P. (2011). Unsupervised feature selection using weighted principal components. Expert Systems with Applications, 38(5), 5704–5710. doi:10.1016/j.eswa.2010.10.063

Larose, D. T. (2005). Discovering Knowledge In Data. USA: John Wiley & Sons, Inc. New York, NY, USA.

Liaw, Y.-C., Wu, C.-M., & Leou, M.-L. (2010). Fast k-nearest neighbors search using modified principal axis search tree. Digital Signal Processing, 20(5), 1494–1501. doi:10.1016/j.dsp.2010.01.009

Liu, N., & Wang, H. (2012). Weighted principal component extraction with genetic algorithms. Applied Soft Computing Journal, 12(2), 961–974. doi:10.1016/j.asoc.2011.08.030

Maimon Oded, R. L. (2010). Data Mining And Knowledge Discovery

Handbook. (R. L. Maimon Oded, Ed.) (Second Edi.). Israel: Springer.

McRoberts, R. E., Magnussen, S., Tomppo, E. O., & Chirici, G. (2011). Parametric, bootstrap, and jackknife variance estimators for the k-Nearest Neighbors technique with illustrations using forest inventory and satellite image data. Remote Sensing of Environment, 115(12), 3165–3174. doi:10.1016/j.rse.2011.07.002

Morimune, K., & Hoshino, Y. (2008). Testing homogeneity of a large data set by bootstrapping. Mathematics And Computers In Simulation, 78, 292–302. doi:10.1016/j.matcom.2008.01.021

Neo, T. K. C., & Ventura, D. (2012). A direct boosting algorithm for the k-nearest neighbor classifier via local warping of the distance metric. Pattern Recognition Letters, 33(1), 92–102. doi:10.1016/j.patrec.2011.09.028

O’Reilly. (2012). Big Data Now: 2012 Edition (First Edit.). O’Reilly Media, Inc.

Polat, K., & Kara, S. (2008). Medical diagnosis of atherosclerosis from Carotid Artery Doppler Signals using principal component analysis ( PCA ), k -NN based weighting pre-processing and Artificial Immune Recognition System ( AIRS ). Elsevier Inc., 41, 15–23. doi:10.1016/j.jbi.2007.04.001

Tian, W., Song, J., Li, Z., & Wilde, P. De. (2014). Bootstrap techniques for sensitivity analysis and model selection in building thermal performance analysis. Applied Energy, 135, 320–328. doi:10.1016/j.apenergy.2014.08.110

Wan, C. H., Lee, L. H., Rajkumar, R., & Isa, D. (2012). A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine. Expert Systems with Applications, 39(15), 11880–11888. doi:10.1016/j.eswa.2012.02.068

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining. (M. A. Witten, I. H., Frank, E., & Hall, Ed.) (Third Edit.). USA: Morgan Kaufmann Publishers.

Wu, Xindong & Kumar, V. (2009). The Top Ten Algorithms in Data Mining. (V. Wu, Xindong & Kumar, Ed.). USA: Taylor & Francis Group.

Zikopoulos, P., Eaton, C., & DeRoos, D. (2012). Understanding big data. New York et al: McGraw …. Mc Graw Hill. doi:1 0 9 8 7 6 5 4 3 2 1


  • There are currently no refbacks.

Journal of Intelligent Systems(JIS, ISSN 2356-3982)
Copyright 2020IlmuKomputer.Com. All rights reserved.