Information theoretic clustering, long-range correlation, power-law scaling and self-similarity concepts have been broadly adopted for characterizing genomic features such as nucleotide composition, flexibility and bending. In this work, the 24 chromosomes of the human pangenome minigraphs, recently assembled by the Human Pangenome Reference Consortium (HPRC), are investigated to check to what extent self-similarity and scaling features are preserved in comparison to the reference linear sequences of the T2T-CHM13 individual. By taking the nucleotide self-similarity of the reference chromosomes as benchmark, it is shown that the pangenome minigraph segments exhibit lower self-similarity of the nucleotide composition compared to the linear sequence. The proposed information measures can be adopted to quantify the nucleotide self-similarity patterns and complement standard alignment techniques towards the coherent definition of the genomic profile of each species.
Information theoretic clustering of the human pangenome minigraph / Ferrero, Renato; Gandino, Filippo; Carbone, Anna. - In: PATTERN RECOGNITION LETTERS. - ISSN 0167-8655. - STAMPA. - 191:(2025), pp. 117-123. [10.1016/j.patrec.2025.03.004]
Information theoretic clustering of the human pangenome minigraph
Renato Ferrero;Filippo Gandino;Anna Carbone
2025
Abstract
Information theoretic clustering, long-range correlation, power-law scaling and self-similarity concepts have been broadly adopted for characterizing genomic features such as nucleotide composition, flexibility and bending. In this work, the 24 chromosomes of the human pangenome minigraphs, recently assembled by the Human Pangenome Reference Consortium (HPRC), are investigated to check to what extent self-similarity and scaling features are preserved in comparison to the reference linear sequences of the T2T-CHM13 individual. By taking the nucleotide self-similarity of the reference chromosomes as benchmark, it is shown that the pangenome minigraph segments exhibit lower self-similarity of the nucleotide composition compared to the linear sequence. The proposed information measures can be adopted to quantify the nucleotide self-similarity patterns and complement standard alignment techniques towards the coherent definition of the genomic profile of each species.File | Dimensione | Formato | |
---|---|---|---|
Information theoretic clustering of the human pangenome minigraph.pdf
accesso aperto
Descrizione: Versiona pubblicata
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
1.77 MB
Formato
Adobe PDF
|
1.77 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2999081