The recent technological advances underlying the screening of large combinatorial libraries in high- throughput mutational scans, deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.
Unsupervised inference of protein fitness landscape from deep mutational scan / Fernandez-de-Cossio-Diaz, Jorge; Uguzzoni, Guido; Pagnani, Andrea. - In: MOLECULAR BIOLOGY AND EVOLUTION. - ISSN 0737-4038. - ELETTRONICO. - 38:1(2021), pp. 318-328. [10.1093/molbev/msaa204]
Unsupervised inference of protein fitness landscape from deep mutational scan
Uguzzoni, Guido;Pagnani, Andrea
2021
Abstract
The recent technological advances underlying the screening of large combinatorial libraries in high- throughput mutational scans, deepen our understanding of adaptive protein evolution and boost its applications in protein design. Nevertheless, the large number of possible genotypes requires suitable computational methods for data analysis, the prediction of mutational effects and the generation of optimized sequences. We describe a computational method that, trained on sequencing samples from multiple rounds of a screening experiment, provides a model of the genotype-fitness relationship. We tested the method on five large-scale mutational scans, yielding accurate predictions of the mutational effects on fitness. The inferred fitness landscape is robust to experimental and sampling noise and exhibits high generalization power in terms of broader sequence space exploration and higher fitness variant predictions. We investigate the role of epistasis and show that the inferred model provides structural information about the 3D contacts in the molecular fold.File | Dimensione | Formato | |
---|---|---|---|
2020.03.18.996595v1.full (1).pdf
accesso aperto
Descrizione: pdf
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Creative commons
Dimensione
1.28 MB
Formato
Adobe PDF
|
1.28 MB | Adobe PDF | Visualizza/Apri |
msaa204.pdf
accesso riservato
Descrizione: Advance Access publication
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.48 MB
Formato
Adobe PDF
|
1.48 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
msaa204.pdf
accesso aperto
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Creative commons
Dimensione
3.14 MB
Formato
Adobe PDF
|
3.14 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2842668