The Role of Ontology and Knowledge Graph in Text Document Classification: A Review of Studies

Document Type : Review Article

Authors

1 Phd Candidate, Knowledge and Information Science, University of Isfahan, Isfahan, Iran

2 Associate Professor, Knowledge and Information Science, University of Isfahan, Isfahan, Iran

3 Assistant Professor, Computer Engineering, University of Isfahan, Isfahan, Iran

10.30484/nastinfo.2024.3548.2264

Abstract

Purpose: With the increasing use of the internet and the growing volume of electronically accessible documents on the web, automatic text classification has become a key method for enhancing information retrieval and managing digital text collections. Text classification allows individuals to search for and retrieve information with greater accuracy and speed. The significance of automatic document classification lies in labeling documents into predefined classes in such a way that documents within a class exhibit the highest similarity and the greatest dissimilarity with documents from other classes while being able to utilize semantic relationships. This study investigates the application of ontology and knowledge graphs in automatic text document classification.

Methods: This study reviewed research and documents related to the application of semantic tools such as ontologies and knowledge graphs in text document classification. To collect texts, three domestic databases including the "National Journal Database," the "Scientific Information Database of Jihad University," and "Marefate Danesh," along with three internal databases such as "Magiran", "SID" and "Civilica" and three external citation databases such as "Web of Science", "Scopus" and "Google Scholar" It has been examined in both categories, regardless of the period.

Findings: Results of text exploration show that the vector space model does not consider the semantic relationships between words and disregards the word order in sentences. Neglecting the various semantic and syntactic relationships between words in natural language provides a different representation of documents. However, ontologies and knowledge graphs help strengthen machine learning models by capturing the meaning of entities and classes. The use of these tools acts as an external reference during the classification process and provides domain knowledge for classification models. In general, using these tools allows machines to comprehend the meaning of the data they work with.

Conclusion: The application of ontologies and knowledge graphs in the classification of textual documents can strengthen the results of machine learning algorithms through the use of background knowledge. These tools can free the meanings of words from ambiguous sentences and solve problems related to natural language. The use of ontology and knowledge graphs can effectively help in the classification of textual documents and improve the accuracy and efficiency of classification models. However, the construction and integration of ontologies and knowledge graphs is a tedious, time-consuming, and complex task that limits the feasibility and practical application of these tools. limits In the Persian language, in addition to the problems raised in the application of ontologies and knowledge graphs in the classification of documents, there are limitations such as the specific features of the language in writing and technical limitations. Therefore, the use of ontology and knowledge graphs in the discussion of the classification of textual documents requires attention to linguistic limitations and technical complexity, and the need for further development and efforts is felt, especially in Persian

Keywords

Main Subjects


CAPTCHA Image

Articles in Press, Accepted Manuscript
Available Online from 20 April 2024
  • Receive Date: 11 January 2024
  • Revise Date: 11 March 2024
  • Accept Date: 20 April 2024