نوع مقاله : مقاله مروری
نویسندگان
1 دانشجوی دکتری، علم اطلاعات و دانششناسی، دانشگاه اصفهان، اصفهان، ایران
2 استاد، علم اطلاعات و دانششناسی، دانشگاه اصفهان، اصفهان، ایران
3 دانشیار، علم اطلاعات و دانششناسی، دانشگاه اصفهان، اصفهان، ایران
4 استادیار، هوش مصنوعی، دانشگاه اصفهان، اصفهان، ایران
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسندگان [English]
Purpose: Today, people are aware of the fact that knowledge is power. Therefore, there has been a shift from information retrieval to knowledge retrieval and knowledge discovery. On the other hand, the study of the vast volume of textual documents on the web has made access to and usability of knowledge challenging for them. One of the solutions to tackle this issue is query-based abstract summarization. Query-based abstract summarization is a fast and efficient approach for navigating texts and is considered a highly dynamic research area. In this study, a systematic review of the studies in this field has been conducted to identify and analyse the relevant research.
Method: In the present applied research, a systematic review was conducted using the PRISMA guidelines. This guideline is implemented in four steps: identification, screening, eligibility, and inclusion, utilizing an appropriate search strategy without time restrictions in the Scopus, Web of Science, IEEE Xplore, the ACM Digital Library, Google Scholar, ProQuest, Noor Mags, Mag Iran, SID, Civilica, Elm net, and Ganj databases. Ultimately, out of the 1,714 identified documents, 31 were found to be eligible and included in the systematic review.
Findings: The findings of the conducted review indicate that studies in this field are relatively recent and have been published with both upward and downward trends. Most of these studies are in the form of articles published in journals. Researchers have predominantly utilized a one-stage approach for the proposed summarization systems, with a greater focus on supervised and self-supervised learning. Additionally, they have employed methods based on rules, statistics, and machine learning. The models used are based on graphs, neural networks, and pre-trained architectures. The input type for the systems is mostly single-document, with Debatepedia identified as the most popular dataset. Among the seventeen of evaluation metrics, ROUGE has been the most widely used.
Conclusion: The reviews indicate how the synergies that have occurred in learning, models, methods used, and evaluation metrics have helped to mitigate challenges such as the mismatch between the generated summary and the query, the incongruity between the generated summary and the source text, the lack of labelled data for training models, redundancy, limited datasets, the absence of datasets specifically for this type of summarization, the lack of improved evaluation metrics for accurately assessing generated summaries, semantic ambiguity due to the lack of distinction between sentences with different meanings, and the absence of alignment between input and output sequences. Ultimately, these improvements have contributed to enhancing the overall performance of summarization systems and their development. However, the ability to understand semantic in these systems has not yet bridged the gap between system-generated summaries and human summaries. This is because the understood semantic remains superficial and shows a degree of reliance on the syntactical structures in the models. In fact, the ability to understand semantic can guarantee the creation of systems that recognize the deeper semantics and insights embedded in the text and apply them in their output based on the specified task. Accordingly, the presentation of innovations to address these inefficiencies is proposed as directions for future research. In this regard, semantic modelling and semantic understanding should be institutionalized within these summarization systems, contributing to the refinement and advancement of existing methodologies. Furthermore, it is essential to keep pace with the changing and evolving information sources as well as the developments in user requests and their knowledge domains. Additionally, there is a noticeable gap for these systems in non-English languages. This can be addressed by developing and strengthening natural language processing tools for non-English languages, enabling their practical implementation.
کلیدواژهها [English]
ارسال نظر درباره این مقاله