Hybrid Keyword Extraction Algorithm and Cosine Similarity for Improving Sentences Cohesion in Text Summarization

Rizki Darmawan; Romi Satria Wahono

Hybrid Keyword Extraction Algorithm and Cosine Similarity for Improving Sentences Cohesion in Text Summarization

Rizki Darmawan, Romi Satria Wahono

Abstract

As the amount of online information increases, systems that can automatically summarize text in a document become increasingly desirable. The main goal of a text summarization is to present the main ideas in a document in less space. In the create text summarization, there are two procedures which are extraction and abstraction procedure. One of extraction procedure is using keyword extraction algorithm which is easier and common but has problems in the lack of cohesion or correlation between sentences. The cohesion between sentences can be applied by using a cosine similarity method. In this study, a hybrid keyword extraction algorithm and cosine similarity for improving sentences cohesion in text summarization has been proposed. The proposed method using compression various compression ratios is used to create candidate of the summary. The result show that proposed method could affect significant increasing cohesion degree after evaluated in the t-Test. The result also shows that 50% compression ratio obtains the best result with Recall, Precision, and F-Measure are 0.761, 0.43 and 0.54 respectively; since summary with compression ratio 50% has higher intersection with human summary than another compression ratio.

Keywords: text summarization, keyword extraction, cosine similarity, cohesion

Full Text:

PDF

References

Aliguliyev, R. M. (2009). Expert Systems with Applications A new sentence similarity measure and sentence based extractive technique for automatic text summarization. Expert Systems With Applications, 36(4), 7764–7772. doi:10.1016/j.eswa.2008.11.022

Bestgen, Y., & Universit, F. (2006). Improving Text Segmentation Using Latent Semantic Analysis. Association for Computational Linguistic, (2001).

Conroy, J. (2001). Matrix Decomposition 1 Introduction. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. (pp. 1–20). ACM.

Das, D. (2007). A Survey on Automatic Text Summarization Single-Document Summarization. Carnegie Mellon University, 1–31.

Fattah, M. A., & Ren, F. (2009). GA, MR, FFNN, PNN and GMM based models for automatic text summarization. Computer Speech & Language, 23(1), 126–144. doi:10.1016/j.csl.2008.04.002

Hovy, E., & Lin, C. (1999). Automated Text Summarization in Summarist. Asscociation for Computer Linguistic.

Hovy, E., & Mckeown, K. (2001). Summarization. Asscociation for Computer Linguistic, 28.

Ishizuka, M. (2003). Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information. International Journal on Artificial Intelligence Tools.

Manning, C., Raghavan, P., & Schlutze, H. (2009). Introduction to Information Retrieval (p. 581). Cambridge University.

Mendoza, M., Bonilla, S., Noguera, C., Cobos, C., & León, E. (2014). Expert Systems with Applications Extractive single-document summarization based on genetic operators and guided local search. Expert Systems With Appilications, 41(9), 4158–4169. doi:10.1016/j.eswa.2013.12.042

Miller, G. A., Beckwith, R., Fellbaum, C., & August, R. (1993). Introduction to WordNet : An On-line Lexical Database, (August).

Nandhini, K., & Balasundaram, S. R. (2013). Improving readability through extractive summarization for learners with reading difficulties. Egyptian Informatics Journal, 14(3), 195–204. doi:10.1016/j.eij.2013.09.001

Nandhini, K., & Balasundaram, S. R. (2014). Extracting easy to understand summary using differential evolution algorithm. Swarm and Evolutionary Computation, 1–9. doi:10.1016/j.swevo.2013.12.004

Porselvi, A., & Gunasundari, S. (2013). Survey on web page visual summarization. International Journal of Emerging Technology and Advanced Engineering, 3(1), 26–32.

Rafi, M., & Shaikh, M. S. (2010). An improved semantic similarity measure for document clustering based on topic maps. Computer Science Department Karachi Pakistan.

Rajman, M. (1998). Text mining – knowledge extraction from unstructured textual data. In In Proceedings of the 6th Conference of International Federation of Classification Societies.

Satya, K. P. N. V, & Murthy, J. V. R. (2012). Clustering Based On Cosine Similarity Measure. International Journal of Engineering Science & Advanced Technology, 2(3), 508–512.

Silber, H. G. (2002). an Intermediate Representation for Automatic Text Summarization. Association for Computational Linguistic, 28, 1–11.

Smith, C., Danielsson, H., & Arne, J. (2011). Cohesion in Automatically Created Summaries. Santa Anna IT Research

Refbacks

There are currently no refbacks.