Research

My main research concerns how datascience and machine learning techniques can be applied to the study of protein evolution.

  • How new proteins arise?
  • Where do old proteins come from ?
  • How they evolved ?
are some general questions that I try to answer to.

Most of my research involves the development of new software and new methodologies linked to protein evolution.

I worked with Alessandra Carbone at the Laboratory for Computational and Quantitative Biology to develop new methods to predict the outcome of protein mutational landscapes. Specifically, I was interested in the dynamical properties of the proteins constraining protein evolution. I studied the case of the CTX-M beta-lactamases for which combinations of mutations have been characterized to increase the activity of the enzyme family against ceftazidime antibiotics. Most of these mutations are not close to the active site, but contribute to small protein structure conformation changes adapting the active site to the new ligand. During my stay at Alessandra's lab, I also developed new skills and personnal ideas to apply deep learning methods to study protein family molecular landscape.

Previously, I developed tools in Isabelle Callebaut's group based on the analysis of the hydrophobic pattern of protein sequences. In these projects, we analyzed the physico-chemical properties of protein sequences in the context of protein domain emergence. We want to understand if recent protein domains display some molecular signatures and if these signatures are linked to particular genomic mechanisms of emergence. I also worked with various Biologists to help them annotate protein sequences without annotation data. I developed dedicated tools to detect and analyze orphan protein domains on these sequences.

During my stay in Pr. Erich Bornberg-Bauer's group, I developed methods to study the evolution of proteins and protein domains in Arthropoda/Insecta species. I performed datascience analyses to compare protein domain and protein domain arrangement taking into account the phylogenetic tree of the species annotated and the functional annotations of the proteins. I developed a clustering method to group orthologous protein based on protein domain similarity which was further extended for multiple sequences alignment. Using these methodologies and combining them with phylogenetic analyses, one can reconstruct and infer protein domain emergence, thus giving us information on recent molecular innovation (novel domain, novel domain arrangement).