Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalences of 18 conditions.
The Healthy States of America: Creating a Health Taxonomy with Social Media / Šćepanović, Sanja; Aiello, Luca Maria; Zhou, Ke; Joglekar, Sagar; Quercia, Daniele. - 15:(2021), pp. 621-632. (Intervento presentato al convegno Fifteenth International AAAI Conference on Web and Social Media (ICWSM-21) tenutosi a held virtually nel June 7-10, 2021) [10.1609/icwsm.v15i1.18089].
The Healthy States of America: Creating a Health Taxonomy with Social Media
Quercia, Daniele
2021
Abstract
Since the uptake of social media, researchers have mined online discussions to track the outbreak and evolution of specific diseases or chronic conditions such as influenza or depression. To broaden the set of diseases under study, we developed a Deep Learning tool for Natural Language Processing that extracts mentions of virtually any medical condition or disease from unstructured social media text. With that tool at hand, we processed Reddit and Twitter posts, analyzed the clusters of the two resulting co-occurrence networks of conditions, and discovered that they correspond to well-defined categories of medical conditions. This resulted in the creation of the first comprehensive taxonomy of medical conditions automatically derived from online discussions. We validated the structure of our taxonomy against the official International Statistical Classification of Diseases and Related Health Problems (ICD-11), finding matches of our clusters with 20 official categories, out of 22. Based on the mentions of our taxonomy's sub-categories on Reddit posts geo-referenced in the U.S., we were then able to compute disease-specific health scores. As opposed to counts of disease mentions or counts with no knowledge of our taxonomy's structure, we found that our disease-specific health scores are causally linked with the officially reported prevalences of 18 conditions.File | Dimensione | Formato | |
---|---|---|---|
healthy_21-2.pdf
accesso riservato
Tipologia:
1. Preprint / submitted version [pre- review]
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
552.98 kB
Formato
Adobe PDF
|
552.98 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
18089-Article Text-21584-1-2-20210521.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
945.05 kB
Formato
Adobe PDF
|
945.05 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2996098