Structural variants (SVs) are a class of genetic alterations that play a crucial role in cancer development. Detecting somatic SVs is challenging, as it requires distinguishing between germline and somatic events and dealing with subclonal variants and the compresence of tumor and normal cells in patient-derived samples. SVs callers based on single-molecule sequencing technologies have emerged as a powerful tool in detecting SVs, thanks to the ability of long reads to span large genomic regions, allowing the detection of more complex rearrangements. However, these tools are still affected by low precision and/or recall, especially in determining somatic SVs. To overcome these limitations, we propose an ensemble method that combines the results of three long-read variant callers with evidence extracted from accompanying short-read alignments. We evaluate our method on a curated truth set provided by the Espejo Valle-Inclan benchmark and show that it can leverage the strengths of each tool while mitigating their weaknesses to produce a ranked list of somatic deletions, useful to prioritize downstream analysis and experimental validation. We also provide insights into the performance of the individual tools and discuss future directions for the extension of our method.

An Ensemble Method for Calling and Ranking Somatic Structural Variants Using Long and Short Reads / GALLEGO GOMEZ, Walter; Grassi, Elena; Bertotti, Andrea; Urgese, Gianvito. - ELETTRONICO. - (2025), pp. 62-69. (Intervento presentato al convegno The 11th International Conference on Bioinformatics Research and Applications (ICBRA 2024) tenutosi a Milan (ITA) nel September 13-15, 2024) [10.1145/3700666.3700694].

An Ensemble Method for Calling and Ranking Somatic Structural Variants Using Long and Short Reads

Walter Gallego Gomez;Gianvito Urgese
2025

Abstract

Structural variants (SVs) are a class of genetic alterations that play a crucial role in cancer development. Detecting somatic SVs is challenging, as it requires distinguishing between germline and somatic events and dealing with subclonal variants and the compresence of tumor and normal cells in patient-derived samples. SVs callers based on single-molecule sequencing technologies have emerged as a powerful tool in detecting SVs, thanks to the ability of long reads to span large genomic regions, allowing the detection of more complex rearrangements. However, these tools are still affected by low precision and/or recall, especially in determining somatic SVs. To overcome these limitations, we propose an ensemble method that combines the results of three long-read variant callers with evidence extracted from accompanying short-read alignments. We evaluate our method on a curated truth set provided by the Espejo Valle-Inclan benchmark and show that it can leverage the strengths of each tool while mitigating their weaknesses to produce a ranked list of somatic deletions, useful to prioritize downstream analysis and experimental validation. We also provide insights into the performance of the individual tools and discuss future directions for the extension of our method.
2025
979-8-4007-1753-6
File in questo prodotto:
File Dimensione Formato  
3700666.3700694.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 1.34 MB
Formato Adobe PDF
1.34 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2993115