The accuracy of bibliometric databases in classifying document types (DTs)—such as research articles, conference proceedings, reviews, short notes, letters, book chapters, etc.—is crucial for the academic community, as bibliometric indicators may signifcantly infuence research funding, decision-making, and academic reputation. This study presents a semi-automated methodology to assess the accuracy of DT classifcation in bibliometric databases, such as Scopus and Web of Science (WoS). The methodology can handle large document volumes and adapt to diferent DT categories without predefned correspondences. The frst phase of the methodology automatically identifes discrepancies in DT classifcations between Scopus and WoS, in order to fnd potentially misclassifed documents; the second phase involves manually analyzing these documents to confrm and attribute classifcation errors. The methodology is applied to a sample of several tens of thousands of papers from the teaching staf of two major universities in Turin (Italy). The results show overall error rates of approximately 2.7% for Scopus and 2.3% for WoS. The paper also analyzes the most common types of errors found in both databases, providing an interpretation of these inaccuracies and some insights for possible improvements in the quality of these databases.

A large‑scale semi‑automated approach for assessing document‑type classifcation errors in bibliometric databases / Maisano, DOMENICO AUGUSTO FRANCESCO; Mastrogiacomo, Luca; Ferrara, Lucrezia; Franceschini, Fiorenzo. - In: SCIENTOMETRICS. - ISSN 0138-9130. - STAMPA. - 130:3(2025), pp. 1901-1938. [10.1007/s11192-025-05244-y]

A large‑scale semi‑automated approach for assessing document‑type classifcation errors in bibliometric databases

Domenico, Maisano;Luca Mastrogiacomo;Lucrezia, Ferrara;Fiorenzo, Franceschini
2025

Abstract

The accuracy of bibliometric databases in classifying document types (DTs)—such as research articles, conference proceedings, reviews, short notes, letters, book chapters, etc.—is crucial for the academic community, as bibliometric indicators may signifcantly infuence research funding, decision-making, and academic reputation. This study presents a semi-automated methodology to assess the accuracy of DT classifcation in bibliometric databases, such as Scopus and Web of Science (WoS). The methodology can handle large document volumes and adapt to diferent DT categories without predefned correspondences. The frst phase of the methodology automatically identifes discrepancies in DT classifcations between Scopus and WoS, in order to fnd potentially misclassifed documents; the second phase involves manually analyzing these documents to confrm and attribute classifcation errors. The methodology is applied to a sample of several tens of thousands of papers from the teaching staf of two major universities in Turin (Italy). The results show overall error rates of approximately 2.7% for Scopus and 2.3% for WoS. The paper also analyzes the most common types of errors found in both databases, providing an interpretation of these inaccuracies and some insights for possible improvements in the quality of these databases.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2999016
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo