Information theoretic clustering, long-range correlation, power-law scaling and self-similarity concepts have been broadly adopted for characterizing genomic features such as nucleotide composition, flexibility and bending. In this work, the 24 chromosomes of the human pangenome minigraphs, recently assembled by the Human Pangenome Reference Consortium (HPRC), are investigated to check to what extent self-similarity and scaling features are preserved in comparison to the reference linear sequences of the T2T-CHM13 individual. By taking the nucleotide self-similarity of the reference chromosomes as benchmark, it is shown that the pangenome minigraph segments exhibit lower self-similarity of the nucleotide composition compared to the linear sequence. The proposed information measures can be adopted to quantify the nucleotide self-similarity patterns and complement standard alignment techniques towards the coherent definition of the genomic profile of each species.

Information theoretic clustering of the human pangenome minigraph / Ferrero, Renato; Gandino, Filippo; Carbone, Anna. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - STAMPA. - 191:(2025), pp. 117-123. [10.1016/j.patrec.2025.03.004]

Information theoretic clustering of the human pangenome minigraph

Renato Ferrero;Filippo Gandino;Anna Carbone
2025

Abstract

Information theoretic clustering, long-range correlation, power-law scaling and self-similarity concepts have been broadly adopted for characterizing genomic features such as nucleotide composition, flexibility and bending. In this work, the 24 chromosomes of the human pangenome minigraphs, recently assembled by the Human Pangenome Reference Consortium (HPRC), are investigated to check to what extent self-similarity and scaling features are preserved in comparison to the reference linear sequences of the T2T-CHM13 individual. By taking the nucleotide self-similarity of the reference chromosomes as benchmark, it is shown that the pangenome minigraph segments exhibit lower self-similarity of the nucleotide composition compared to the linear sequence. The proposed information measures can be adopted to quantify the nucleotide self-similarity patterns and complement standard alignment techniques towards the coherent definition of the genomic profile of each species.
File in questo prodotto:
File Dimensione Formato  
Information theoretic clustering of the human pangenome minigraph.pdf

accesso aperto

Descrizione: Versiona pubblicata
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 1.77 MB
Formato Adobe PDF
1.77 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2999081