Analysis of Citation-based Indicators to Determine the Relevance of Articles

Document Type : Research َ Article

Authors

1 PhD Candidate, Knowledge and Information Science Department, International Division, Shiraz University, Shiraz, Iran

2 PhD in Knowledge & Information Science, Assistant Professor, School of Education & Psychology, Shiraz University, Shiraz, Iran

3 PhD in Knowledge & Information Science, Professor, School of Education & Psychology, Shiraz University, Shiraz, Iran,

4 PhD in Computer Engineering, Associate Professor, Department of Computer Science and Engineering & IT, Shiraz University, Shiraz, Iran

5 PhD in Computer Engineering, Assistant Professor, School of Education & Psychology, Shiraz University, Shiraz, Iran

Abstract

Purpose: The present study aimed to investigate the potential of citation-based indicators (Co-Citation, Bibliographic Coupling, Amsler, PageRank, HITS) to determine the relevance of articles.
  Method: This is applied research with correlational approach. The population consisted of 26,262 articles in the PubMed Central open access subset of the CITREC, which had citation relationship with other articles based on all three traditional citation-based indicators (Co-Citation, Bibliographic coupling, Amsler). From among the citations in the research population, 30 were selected as basic ones, and the full-text of them were retrieved based on the mesh similarity. Then the similarities among the retrieved documents were extracted based on citation-based indicators. Each of the citation-based metrics was considered as independent variable and the mesh similarity as dependent variable. A MySQL database was created using WampServer simulation software and PHP My Admin. Then, using online demo of the CITREC test collection, an output was prepared.  By entering the output into the MySQL database which contains the research data set, the main structure of its tables was created. Finally, by studying all the required codes from the CITREC source code package, we attempted to enter the required codes by applying necessary changes. The results were entered in the created MySQL database. By writing a query in SQL language, the set citation network was completely extracted and stored in a Comma-separated values (CSV) file. Then, a program was written in Python that could open and process this large file and calculate PageRank and HITS numbers (authority and Hub).
   Findings: The results showed that all six measures studied had a significant and positive correlation with the relevance of articles. In other words, with increasing the values ​​of each measure, the degree of relevance of the articles also increased. The highest correlation with the relevance of the articles belonged to the Amsler measure, followed by the Bibliographic Coupling. After Amsler and Bibliographic Coupling, the highest correlation was observed in the HITS(Authority) variable, and the PageRank variable was in the fourth place; Finally, the lowest correlation with the relevance of the articles was related to the Co-Citation and the HITS (Hub). Therefore, among the known Citation- based measure studied here, Amsler, Bibliographic Coupling, HITS(Authority) and PageRank metrics, respectively, had more potential to determine the relevance of articles rather than others.
  Conclusion: Based on the findings, it can be concluded that the citation-based metrics studied are able to estimate the degree of relevance of articles. Therefore, they can be used in various information retrieval platforms, including search engines, citation- based databases, recommender systems, and even digital libraries to access articles, suggest similar articles, and rank retrieved results; Also, the Amsler measure as the less used in information retrieval systems than the two traditional Measure (Co- Citations and Bibliographic Coupling) needs to be considered more than ever. On the other hand, despite the fact that Co- Citations measure is used in some international information retrieval databases (such as Science Direct and CiteSeer) to retrieve relevant documents and suggest similar documents, it is less efficient than other metrics.

Keywords

Main Subjects


  • Agrahri, A. K., Manickam, D. A. T., & Riedl, J. (2008, October). Can people collaborate to improve the relevance of search results?. In Proceedings of the 2008 ACM conference on Recommender systems (pp. 283-286). ACM. https://doi.org/10.1145/1454008.1454052
  • Ahlgren, P., Chen, Y., Colliander, C., & van Eck, N. J. (2020). Enhancing direct citations: A comparison of relatedness measures for community detection in a large set of PubMed publications. Quantitative Science Studies1(2), 714-729.‏ https://doi.org/10.1162/qss_a_00027
  • Ahlgren, P., & Jarneving, B. (2008). Bibliographic coupling, common abstract stems and clustering: A comparison of two document-document similarity approaches in the context of science mapping. Scientometrics, 76(2), 273-290. https://doi.org/10.1007/s11192-007-1935-1
  • Amer, A. A., & Abdalla, H. I. (2020). A set theory based similarity measure for text clustering and classification. Journal of Big Data7(1), 1-43. https://doi.org/10.1186/s40537-020-00344-3
  • Bar‐Ilan, J., Keenoy, K., Levene, M., & Yaari, E. (2009). Presentation bias is significant in determining user preference for search results—A user study. Journal of the American Society for Information Science and Technology, 60(1), 135-149. https://doi.org/10.1002/asi.20941
  • Bar‐Ilan, J., Keenoy, K., Yaari, E., & Levene, M. (2007). User rankings of search engine results. Journal of the American Society for Information Science and Technology, 58(9), 1254-1266.  https://doi.org/10.1002/asi.20608

 

  • Bar-Ilan, J., Levene, M., & Mat-Hassan, M. (2006). Methods for evaluating dynamic changes in search engine rankings: a case study. Journal of Documentation, 62(6), 708-729. https://doi.org/10.1108/00220410610714930
  • Bichteler, J., & Eaton, E. A. (1980). The combined use of bibliographic coupling and cocitation for document retrieval. Journal of the American Society for Information Science, 31(4), 278-282. https://doi.org/10.1002/asi.4630310408
  • Boyack, K. W., & Klavans, R. (2010). Co‐citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389-2404. https://doi.org/10.1002/asi.21419
  • Boyack, K. W., & Klavans, R. (2020). A comparison of large-scale science models based on textual, direct citation and hybrid relatedness. Quantitative Science Studies, 1-16.‏ https://doi.org/10.1162/qss_a_00085
  • Burnham, J. F. (2006). Scopus database: A review. Biomedical Digital Libraries, 3(1), 1-8. https://doi.org/10.1186/1742-5581-3-1
  • Char, D. C., & Ajiferuke, I. (2013, October). Comparison of the effectiveness of related functions in Web of Science and Scopus. In Proceedings of the Annual Conference of CAIS/Actes du congrès annuel de l'ACSI. https://doi.org/10.29173/cais353
  • Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google’s PageRank algorithm, Journal of Informetrics, 1(1), 8-‏‏‏‏ https://doi.org/10.1016/j.joi.2006.06.001
  • Colavizza, G., Boyack, K. W., Van Eck, N. J., & Waltman, L. (2018). The closer the better: Similarity of publication pairs at different cocitation levels. Journal of the Association for Information Science and Technology69(4), 600-609. https://doi.org/10.1002/asi.23981
  • Devi, P., Gupta, A., & Dixit, A. (2014). Comparative study of hits and pagerank link based ranking algorithms. International Journal of Advanced Research in Computer and Communication Engineering, 3(2), 5749-5754.
  • Eminagaoglu, M. (2020). A new similarity measure for vector space models in text classification and information retrieval. Journal of Information Science. https://doi.org/10.1177/0165551520968055
  • Eto, M. (2013). Evaluations of context-based co-citation searching. Scientometrics, 94(2), 651-673. https://doi.org/10.1007/s11192-012-0756-z
  • Eto, M. (2019). Extended co-citation search: Graph-based document retrieval on a co-citation network containing citation context information. Information Processing & Management, 56(6), 102046. https://doi.org/10.1016/j.ipm.2019.05.007
  • Gipp, B., & Beel, J. (2009). Citation proximity analysis (CPA): a new approach for identifying related work based on co-citation analysis. In ISSI’09: 12th International Conference on Scientometrics and Informetrics (pp. 571-575). Retrieved June 20, 2020 from https://isg.beel.org/pubs/Citation%20Proximity%20Analysis%20(CPA)%20-%20A%20new%20approach%20for%20identifying%20related%20work%20based%20on%20Co-Citation%20Analysis%20--%20preprint.pdf.
  • Gipp, B., Meuschke, N., & Lipinski, M. (2015). CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central. In iConference 2015, Newport Beach, California. https://doi.org/10.5281/zenodo.3547372
  • Goswami, P., Mantri, M., & Bhattacharya, M. (2017). Web Page Ranking Based On User Query.‏ International Journal of HIT Transaction on ECCN, 3(2A), 21-33.
  • Hajian, A., & CheshmehSohrabi, M. (2020). Ranking and Relevance in Noormags and RICEST Databases. National Studies on Librarianship and Information Organization31(3), 72-92.‏ DOI: 10.30484/nastinfo.2020.2472.1934 [In Persian]

] حاجیان، آزاده؛ چشمه­سهرابی، مظفر (1399). رتبه‌بندی و ربط مقالات در پایگاه‌های اطلاعاتی نورمگز و رایسست. مطالعات ملی کتابداری و سازماندهی اطلاعات، 31(3)، 92-72 [.

  • Hariri, N. (2011). Relevance ranking on Google: Are top ranked results really considered more relevant by the users?. Online Information Review, 35(4), 598-610. https://doi.org/10.1108/14684521111161954
  • Herskovic, J. R., & Bernstam, E. V. (2005). Using incomplete citation data for MEDLINE results ranking. In AMIA Annual Symposium proceedings (Vol. 2005, p. 316-320). American Medical Informatics Association.
  • Janssens, A. C. J., Gwinn, M., Brockman, J. E., Powell, K., & Goodman, M. (2020). Novel citation-based search method for scientific literature: a validation study. BMC medical research methodology20(1), 1-11.‏ https://doi.org/10.1186/s12874-020-0907-5
  • Jiang, X., Sun, X., Yang, Z., Zhuge, H., & Yao, J. (2016). Exploiting Heterogeneous Scientific Literature Networks to Combat Ranking Bias: Evidence From the Computational Linguistics Area. Journal of the American Society for Information Science and Technology, 67(7), 1679-‏‏‏‏‏ https://doi.org/10.1002/asi.23463
  • Jiang, T., Zhang, Z., Zhao, T., Qin, B., Liu, T., Chawla, N. V., & Jiang, M. (2019, November). CTGA: Graph-based Biomedical Literature Search. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)(pp. 395-400). IEEE. DOI: 1109/BIBM47256.2019.8983173
  • Jouili S., Tabbone S., Valveny E. (2010) Comparing Graph Similarity Measures for Graphical Recognition. In: Ogier JM., Liu W., Lladós J. (eds) Graphics Recognition. Achievements, Challenges, and Evolution. GREC 2009. Lecture Notes in Computer Science, vol 6020. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13728-0_4
  • Kirsch, S. M., Gnasa, M., Won, M., & Cremers, A. (2008). From PageRank to Social Rank: Authority-Based Retrieval in Social Information Spaces. In Social Information Retrieval Systems: Emerging Technologies and Applications for Searching the Web Effectively (pp. 134-154). IGI Global.
  • Kessler, M.M. (1963). Bibliographic coupling between scientific papers. American Documentation, 14(1), 10–25. https://doi.org/10.1002/asi.5090140103
  • Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM), 46(5), 604-632.
  • Lewandowski, D. (2008). The retrieval effectiveness of web search engines: considering results descriptions. Journal of documentation, 64(6), 915- 937. Retrieved June 20, 2020 from https://arxiv.org/ftp/arxiv/papers/1511/1511.05800.pdf.
  • Lewandowski D. (2017) Is Google Responsible for Providing Fair and Unbiased Results? In: Taddeo M., Floridi L. (eds) The Responsibilities of Online Service Providers. Law, Governance and Technology Series, vol 31, pp.61-77 Springer, Cham. https://doi.org/10.1007/978-3-319-47852-4_4
  • Lin, J. (2008). PageRank without hyperlinks: Reranking with PubMed related article networks for biomedical text retrieval. BMC Bioinformatics, 9, 270. https://doi.org/10.1186/1471-2105-9-270
  • Lin, Y. S., Jiang, J. Y., & Lee, S. J. (2013). A similarity measure for text classification and clustering. IEEE transactions on knowledge and data engineering26(7), 1575-1590. doi: 10.1109/TKDE.2013.19.
  • Liu, Y., & Lin, Y. (2007, October). Supervised HITS algorithm for MEDLINE citation ranking. In Bioinformatics and Bioengineering (BIBE), 2007. Proceedings of the 7th IEEE International Conference on (pp. 1323-1327).
  • Liu, X., Zhang, J., & Guo, C. (2012, October). Full-text citation analysis: enhancing bibliometric and scientific publication ranking. In Proceedings of the 21st ACM international conference on Informationand knowledge management (pp. 1975-1979). ACM. https://doi.org/10.1145/2396761.2398555
  • Lu, W., Janssen, J., Milios, E., Japkowicz, N., & Zhang, Y. (2007). Node similarity in the citation graph. Knowledge and Information Systems, 11(1), 105-129. https://doi.org/10.1007/s10115-006-0023-9
  • Ma, N., Guan, J., & Zhao, Y. (2008). Bringing PageRank to the citation analysis. Information Processing and Management, 44:800-810. https://doi.org/10.1016/j.ipm.2007.06.006
  • Mccain, W. (1989). Descriptor and Citation Retrieval in the Medical Behavioral Sciences Literature: Retrieval Overlaps and Novelty Distribution. Journal of the American Society for Information Science, 40(2), 110-114. https://doi.org/10.1002/(SICI)1097-4571(198903)40:2<110::AID-ASI5>3.0.CO;2-T
  • McGill, M., Koll, M., & Noreault, T. (1979). An evaluation of factors affecting document ranking by information retrieval systems, Syracuse, NY: School of Information Studies, Syracuse University.
  • Nicolaisen, J., & Frandsen, T. F. (2012). Consensus formation in science modeled by aggregated bibliographic coupling. Journal of Informetrics6(2), 276-284.‏ DOI: 10.1016/j.joi.2011.08.001
  • Neshat, N. (2003). Hermeneutice and Information Retrieval. Informology, 2, 31-46. http://ensani.ir/file/download/article/20110108110840-0%20(1)

] نشاط، نرگس (1382). هرمنوتیک و بازیابی اطلاعات. اطلاع­شناسی، 2، 31-46 [.

  • ‏Pao, M. L., & Worthen, B. (1989). Retrieval Effectiveness by Semantic and Citation Searching. Journal of the american society for information science, 40(4), 226-235. https://doi.org/10.1002/(SICI)1097-4571(198907)40:4<226::AID-ASI2>3.0.CO;2-6
  • Reyhani Hamedani, M., & Kim, S. W. (2021). On Investigating Both Effectiveness and Efficiency of Embedding Methods in Task of Similarity Computation of Nodes in Graphs. Applied Sciences11(1), 162. ‏ https://doi.org/10.3390/app11010162
  • Reyhani Hamedani, M., Kim, S. W., & Kim, D. J. (2016). SimCC: A novel method to consider both content and citations for computing similarity of scientific papers. Information Sciences, 334, 273-292. https://doi.org/10.1016/j.ins.2015.12.001
  • Reyhani Hamedani, M.., Lee, S. C., & Kim, S. W. (2013, October). On combining text-based and link-based similarity measures for scientific papers. In Proceedings of the 2013 Research in Adaptive and Convergent Systems (pp. 111-115). ACM. DOI:1145/2513228.2513321
  • Sadein Khorram, S., & Abbaspour, J. (2019). Article Ranking by Recommender Systems vs. Users’ Perspectives. National Studies on Librarianship and Information Organization30(3), 46-57. DOI: 10.30484/nastinfo.2019.2187.1838 http://nastinfo.nlai.ir/article_2346.html

] سعدین­خرم، صبا؛ عباس­پور، جواد (1398). سنجش رتبه‌بندی سامانه‌های پیشنهاددهندۀ مقاله در تقابل با رتبه‌بندی کاربران. مطالعات ملی کتابداری و سازماندهی اطلاعات، 30(3)، 57-46 [.

  • Shibata, N., Kajikawa, Y., Takeda, Y., & Matsushima, K. (2009). Comparative study on methods of detecting research fronts using different types of citation. Journal of the American Society for information Science and Technology60(3), 571-580.‏ https://doi.org/10.1002/asi.20994
  • Small, H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265–269. https://doi.org/10.1002/asi.4630240406
  • Smith, L. C. (1981). Citation Analysis. Library Trends30(1), 83-106.
  • Su, C., Pan, Y., Zhen, Y., Ma, Z., Yuan, J., Guo, H., ... & Wu, Y. (2011). PrestigeRank: A new evaluation method for papers and journals. Journal of Informetrics5(1), 1-13.‏ https://doi.org/10.1016/j.joi.2010.03.011
  • Thelwall, M. (2003). Can Google's PageRank be used to find the most important academic Web pages?. Journal of Documentation, 59(2), 205-217. https://doi.org/10.1108/00220410310463491
  • Thompson, V. U., Panchev, C., & Oakes, M. (2015, November). Performance evaluation of similarity measures on similar and dissimilar text retrieval. In 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K)(Vol. 1, pp. 577-584). IEEE.‏
  • Torres, G. J., Basnet, R. B., Sung, A. H., Mukkamala, S., & Ribeiro, B. M. (2009). A similarity measure for clustering and its applications. Int J Electr Comput Syst Eng3(3), 164-170.‏
  • Wanjantuk, P., & Keane, J. A. (2004, October). Finding related documents via communities in the citation graph. In Communications and Information Technology, 2004. ISCIT 2004. IEEE International Symposium on(Vol. 1, pp. 445-450). IEEE. doi: 10.1109/ISCIT.2004.1412885.
  • Yin, X., Huang, X., Hu, Q., & Li, Z. (2009, April). Boosting biomedical information retrieval performance through citation graph: An empirical study. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 949-956). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_100
  • Yin, X., Huang, J. X., & Li, Z. (2011). Mining and modeling linkage information from citation context for improving biomedical literature retrieval. Information processing & management, 47(1), 53-67. https://doi.org/10.1016/j.ipm.2010.03.010
  • Yoon, S. H., Kim, S. W., & Park, S. (2016). C-Rank: A link-based similarity measure for scientific literature databases. Information Sciences, 326, 25-40. https://doi.org/10.1016/j.ins.2015.07.036
  • Zhang, J. & Korfhage, R. (1999).A Distance and Angle Similarity Measure Method. Journal of the American Society for Information Science, 50(9),772–778. https://doi.org/10.1002/(SICI)1097-4571(1999)50:9<772::AID-ASI5>3.0.CO;2-E
  • Zhuge, H., & Zhang, J. (2010). Topological centrality and its e‐Science applications. Journal of the Association for Information Science and Technology, 61(9), 1824-1841.  
  • https://doi.org/10.1002/asi.21353
CAPTCHA Image