The accuracy of bibliometric databases in classifying document types (DTs)—such as research articles, conference proceedings, reviews, short notes, letters, book chapters, etc.—is crucial for the academic community, as bibliometric indicators may signifcantly infuence research funding, decision-making, and academic reputation. This study presents a semi-automated methodology to assess the accuracy of DT classifcation in bibliometric databases, such as Scopus and Web of Science (WoS). The methodology can handle large document volumes and adapt to diferent DT categories without predefned correspondences. The frst phase of the methodology automatically identifes discrepancies in DT classifcations between Scopus and WoS, in order to fnd potentially misclassifed documents; the second phase involves manually analyzing these documents to confrm and attribute classifcation errors. The methodology is applied to a sample of several tens of thousands of papers from the teaching staf of two major universities in Turin (Italy). The results show overall error rates of approximately 2.7% for Scopus and 2.3% for WoS. The paper also analyzes the most common types of errors found in both databases, providing an interpretation of these inaccuracies and some insights for possible improvements in the quality of these databases.
A large‑scale semi‑automated approach for assessing document‑type classifcation errors in bibliometric databases / Maisano, DOMENICO AUGUSTO FRANCESCO; Mastrogiacomo, Luca; Ferrara, Lucrezia; Franceschini, Fiorenzo. - In: SCIENTOMETRICS. - ISSN 0138-9130. - STAMPA. - 130:3(2025), pp. 1901-1938. [10.1007/s11192-025-05244-y]
A large‑scale semi‑automated approach for assessing document‑type classifcation errors in bibliometric databases
Domenico, Maisano;Luca Mastrogiacomo;Lucrezia, Ferrara;Fiorenzo, Franceschini
2025
Abstract
The accuracy of bibliometric databases in classifying document types (DTs)—such as research articles, conference proceedings, reviews, short notes, letters, book chapters, etc.—is crucial for the academic community, as bibliometric indicators may signifcantly infuence research funding, decision-making, and academic reputation. This study presents a semi-automated methodology to assess the accuracy of DT classifcation in bibliometric databases, such as Scopus and Web of Science (WoS). The methodology can handle large document volumes and adapt to diferent DT categories without predefned correspondences. The frst phase of the methodology automatically identifes discrepancies in DT classifcations between Scopus and WoS, in order to fnd potentially misclassifed documents; the second phase involves manually analyzing these documents to confrm and attribute classifcation errors. The methodology is applied to a sample of several tens of thousands of papers from the teaching staf of two major universities in Turin (Italy). The results show overall error rates of approximately 2.7% for Scopus and 2.3% for WoS. The paper also analyzes the most common types of errors found in both databases, providing an interpretation of these inaccuracies and some insights for possible improvements in the quality of these databases.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2999016
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo