A Study of the Effect of Summarization Techniques on Persian Texts Classification

Arabahmadi, F.Z.; Karbasi, S.

doi:10.30484/nastinfo.2019.2331

A Study of the Effect of Summarization Techniques on Persian Texts Classification

Document Type : Research َ Article

Authors

F.Z. Arabahmadi ¹
S. Karbasi ²

¹ MA in Computer Scinece, Golestan University, Gorgan

² Assistant Professor, Computer Scinece, Golestan University, Gorgan

10.30484/nastinfo.2019.2331

Abstract

The purpose of this study is to verify combination of some classification and summarization techniques and to examine evaluation metrics of classification. The proposed framework implemented in seven main stages. First, 1,000 documents collected from yjc.ir website. The selection of documents is based on the appropriate content and a minimum of 100 and a maximum of 350 words. These documents divided into three categories: document title, document summary and original text of the document. Summary text and the original text grouped into 250, 500 and 1000 documents in two stages, with a 100% growth in the number of documents. The pre-processing of text performed and the stop-words deleted from the sentences. Next, the TF-ISF summarizer techniques implemented. A variety of classification algorithms such as Decision trees, Support vector machine, Bayesian and Rule implemented by the RapidMiner software, which provided 120 Excel outputs from the results of the evaluation criteria (accuracy, precision, and recall). Finally, five comparisons between the results considered. The results of this study indicate that the superiority of 1,000 documents, the ISF summarizer method versus TF, Bayesian and SVM classification versus Rule and Decision tree classifications, the original text versus summary text with highest of 96.67% of accuracy in SVM classification, 1000 documents and ISF summarizer technique.

Keywords

Main Subjects

Classification

References

آهنگری، ‌فاطمه ‌(1396). ‌معرفی خلاصه‌ساز خودکار متون فارسی مبتنی بر الگوریتم‌های فراابتکاری. ‌پایان‌نامه کارشناسی ارشد، دانشگاه گلستان، گرگان.

احمدی، ‌‌سیدمحمدحسین ‌(1390). ‌دسته‌بندی موضوعی متون فارسی براساس روش قواعد انجمنی. ‌پایان‌نامه کارشناسی ارشد، دانشگاه پیام نور، تهران.

شورای عالی اطلاع‌رسانی (1388). ‌بررسی مستندات ابزارهای خودکار خلاصه‌سازی زبان‌های دنیا برای به‌کارگیری در خلاصه‌سازی متون زبان فارسی، ‌طرح جامع ایجاد پیکره زبان فارسی با موضوع ایجاد پیکره متنی زبان فارسی (ویرایش 1). بازیابی 2 آبان 1398، از http://www.prosody.ir/attachments/059_26-Summerization.pdf

غضنفری، ‌مهدی؛ ‌علیزاده، سمیه؛ و ‌تیمورپور، بابک ‌(1393). ‌داده کاوی و کشف دانش. تهران:‌ دانشگاه علم و صنعت ایران.

Brindha, S., Prabha, K., & Sukumaran, S. (2016). A survey on classification techniques for text mining. In 3rd International Conference on Advanced Computing and Communication Systems (ICACCS), January 22- 23. Retrieved October 9, 2019, from https://ieeexplore.ieee.org/document/7586371

Ferreira, R., Simske, S., & Riss, M. (2015). Automatic document classiﬁcation using summarization strategies. In DocEng’15, September 8-11, (pp. 69-72). New York, N.Y.: ACM.

Han. J., & Kamber, M. (2012). Data minin: Concepts and techniques (3^rd ed.). Waltham: Morgan Kaufmann Publisher,.

Jeong, H., Ko, Y., & Seo, J. (2016). How to improve text summarization and classiﬁcation by cooperation on an integrated framework. Expert Systems with Applications, 60 (C), 222-233.

Jiang, X., Fan, X., & Chen, K. (2007). Chinese text classification based on summarization technique. In Third International Conference on Semantics, Knowledge and Grid, October 29-31, (pp. 362-365). Retrieved October 20, 2019, from https://ieeexplore.ieee.org/document/4438570

Rahman, N., & Borah, B. (2015). A survey on existing extractive techniques for query-based text summarization. In International Symposiwn on Advanced Computing and Communication (ISACC), September 14-15, (pp. 98-102). Retrieved October 20, 2019, from https://ieeexplore.ieee.org/document/7377323

Thwaib, E. (2014). Text summarization as Feature Selection for Arabic Text Classification. World of Computer Science and Information Technology Journal, 4 (7), 101-104.

Name *

Email Address *

Affiliation *

Comments *

Security Code *

Librarianship and Information Organization Studies

A Study of the Effect of Summarization Techniques on Persian Texts Classification

References

Send comment about this article

Volume 30, Issue 3 - Serial Number 119
November 2019
Pages 8-23

Files

History

Share

How to cite

Statistics

A Study of the Effect of Summarization Techniques on Persian Texts Classification

References

Send comment about this article

Volume 30, Issue 3 - Serial Number 119November 2019Pages 8-23

Files

History

Share

How to cite

Statistics

Volume 30, Issue 3 - Serial Number 119
November 2019
Pages 8-23