Impact of phylogeny on the inference of functional sectors from protein sequence data

Type: Preprint

Publication Date: 2024-05-08

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2405.04920

Abstract

Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that natural selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Impact of phylogeny on the inference of functional sectors from protein sequence data 2024 Nicola Dietler
Alia Abbara
Subham Choudhury
Anneā€Florence Bitbol
+ PDF Chat Revealing evolutionary constraints on proteins through sequence analysis 2018 Shou-Wen Wang
Anneā€Florence Bitbol
Ned S. Wingreen
+ Identifying the Genetic Basis of Functional Protein Evolution Using Reconstructed Ancestors 2014 Victor Hanson-Smith
Christopher R. Baker
Alexander D. Johnson
+ PDF Chat Modeling sequence-space exploration and emergence of epistatic signals in protein evolution 2021 Matteo Bisardi
Juan Rodriguez-Rivas
Francesco Zamponi
Martin Weigt
+ Impact of phylogeny on structural contact inference from protein sequence data 2022 Nicola Dietler
Umberto Lupo
Anneā€Florence Bitbol
+ Impact of phylogeny on structural contact inference from protein sequence data 2023 Nicola Dietler
Umberto Lupo
Anneā€Florence Bitbol
+ PDF Chat Impact of phylogeny on structural contact inference from protein sequence data 2022 Nicola Dietler
Umberto Lupo
Anneā€Florence Bitbol
+ Revealing evolutionary constraints on proteins through sequence analysis 2019 Shou-Wen Wang
Anneā€Florence Bitbol
Ned S. Wingreen
+ PDF Chat Emergent time scales of epistasis in protein evolution 2024 Leonardo Di Bari
Matteo Bisardi
Sabrina Cotogno
Martin Weigt
Francesco Zamponi
+ Emergent time scales of epistasis in protein evolution 2024 Leonardo Di Bari
Matteo Bisardi
Sabrina Cotogno
Martin Weigt
Francesco Zamponi
+ Emergent time scales of epistasis in protein evolution 2024 Leonardo Di Bari
Matteo Bisardi
Sabrina Cotogno
Martin Weigt
Francesco Zamponi
+ PDF Chat Protein Sectors: Statistical Coupling Analysis versus Conservation 2015 Tiberiu Teşileanu
Lucy J. Colwell
Stanislas Leibler
+ PDF Chat DCAlign v1.0: aligning biological sequences using co-evolution models and informed priors 2023 Anna Paola Muntoni
Andrea Pagnani
+ PDF Chat Phylogenetic correlations can suffice to infer protein partners from sequences 2019 Guillaume Marmier
Martin Weigt
Anneā€Florence Bitbol
+ PDF Chat Uncovering sequence diversity from a known protein structure 2024 Luca Alessandro Silva
Barthelemy Meynard-Piganeau
Carlo Lucibello
Christoph Feinauer
+ PDF Chat Estimating the evidence of selection and the reliability of inference in unigenic evolution 2010 Andrew D. Fernandes
Benjamin P. Kleinstiver
David R. Edgell
Lindi M. Wahl
Gregory B. Gloor
+ PDF Chat AOC: Analysis of Orthologous Collections - an application for the characterization of natural selection in protein-coding sequences. 2024 Alexander G. Lucaci
Sergei L. Kosakovsky Pond
+ From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics 2021 Susanna C. Manrubia
JosƩ A. Cuesta
Jacobo Aguirre
Sebastian E. Ahnert
Lee Altenberg
Alejandro V. Cano
Pablo CatalƔn
RamĆ³n DıĢaz-Uriarte
Santiago F. Elena
Juan Antonio GarcĆ­a-MartĆ­n
+ PDF Chat Phylogenetic correlations can suffice to infer protein partners from sequences 2019 Guillaume Marmier
Martin Weigt
Anneā€Florence Bitbol
+ PDF Chat Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences 2021 Andonis Gerardos
Nicola Dietler
Anneā€Florence Bitbol

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors