Document-type (DT) classification – i.e., the assignment of conventional labels such as article, review, proceedings paper, etc., to scientific documents – is crucial for information retrieval in bibliometric databases, but its incomplete objectivity can lead to errors with implications on indicators and research evaluations. This study focuses on a portion of the documents (with a relatively small incidence ~4%) with dual-DT assignment in Web of Science (WoS) – a feature that is absent in Scopus, which applies only single-DT assignments – to assess their characteristics and classification accuracy. A manual analysis of more than a thousand documents revealed three main scenarios of dual-DT assignment in WoS: (i) the combination of one DT describing the content and another describing the container (e.g., book chapters, proceedings papers), (ii) the handling of specialized DTs (e.g., data paper, retracted paper), and (iii) the combination of a DT related to journal publication with a temporary DT for the early-access designation. Documents with dual-DT assignment in WoS exhibit higher error rates, confirming the greater difficulty of classification for both databases, even for Scopus, regardless of its single-DT policy. WoS's dual-DT classification policy offers more detail and potentially greater accuracy but also shows some inconsistencies. Conversely, Scopus's single-DT policy reduces the level of detail and increases the risk of misclassification, particularly for papers from conference proceedings or journal special issues. This study highlights the need for clearer DT definitions and recommends that bibliometric databases consider adopting more flexible multiple-DT classification policies to enhance both detail and accuracy in document classification. A limitation of this research is the relatively small corpus of documents analysed, which will be expanded in future studies.
Impact of Web of Science and Scopus Policies on Multiple Document-Type Classification / Maisano, Domenico A.; Ferrara, Lucrezia; Franceschini, Fiorenzo. - ELETTRONICO. - Proceedings of the 20th International Conference on Scientometrics & Informetrics, Volume 1:(2025), pp. 968-984. (Intervento presentato al convegno 20th International Conference on Scientometrics & Informetrics (ISSI2025) tenutosi a Yerevan (Armenia) nel 23-27 giugno 2025) [10.51408/issi2025_053].
Impact of Web of Science and Scopus Policies on Multiple Document-Type Classification
Domenico A. Maisano;Lucrezia Ferrara;Fiorenzo Franceschini
2025
Abstract
Document-type (DT) classification – i.e., the assignment of conventional labels such as article, review, proceedings paper, etc., to scientific documents – is crucial for information retrieval in bibliometric databases, but its incomplete objectivity can lead to errors with implications on indicators and research evaluations. This study focuses on a portion of the documents (with a relatively small incidence ~4%) with dual-DT assignment in Web of Science (WoS) – a feature that is absent in Scopus, which applies only single-DT assignments – to assess their characteristics and classification accuracy. A manual analysis of more than a thousand documents revealed three main scenarios of dual-DT assignment in WoS: (i) the combination of one DT describing the content and another describing the container (e.g., book chapters, proceedings papers), (ii) the handling of specialized DTs (e.g., data paper, retracted paper), and (iii) the combination of a DT related to journal publication with a temporary DT for the early-access designation. Documents with dual-DT assignment in WoS exhibit higher error rates, confirming the greater difficulty of classification for both databases, even for Scopus, regardless of its single-DT policy. WoS's dual-DT classification policy offers more detail and potentially greater accuracy but also shows some inconsistencies. Conversely, Scopus's single-DT policy reduces the level of detail and increases the risk of misclassification, particularly for papers from conference proceedings or journal special issues. This study highlights the need for clearer DT definitions and recommends that bibliometric databases consider adopting more flexible multiple-DT classification policies to enhance both detail and accuracy in document classification. A limitation of this research is the relatively small corpus of documents analysed, which will be expanded in future studies.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3001965
Attenzione
Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo