Work for a Member company and need a Member Portal account? Register here with your company email address.

Thesis

Distillation of Language Model Semantics to Folded Three-Dimensional Protein Structures

Allan dos Santos Costa

Sept. 6, 2021

People

Groups

Share this publication

Abstract

Determining the structure of proteins has been a long-standing goal in biology. Language models have been recently deployed to capture the evolutionary semantics of protein sequences, and as an emergent property, were found to be structural learners. Enriched with multiple sequence alignments (MSA), these transformer models were able to capture significant information about a protein’s tertiary structure. In this work, we show how such structural information can be recovered by processing language model embeddings, and introduce a two-stage folding pipeline to directly estimate three-dimensional folded structures from protein sequences. We envision that this pipeline will provide a basis for efficient, end-to-end protein structure prediction through protein language modeling

Costa-allanc-SM-MAS-2021-thesis.pdf

Distillation of Language Model Semantics to Folded Three-Dimensional Protein Structures

People

Groups

Abstract

Experimental peptide targets Covid-19

Targeted intracellular degradation of SARS-CoV-2 via computationally optimized peptide fusions

RiboGen: RNA Sequence and Structure Co-Generation with Equivariant MultiFlow

Nature-inspired CRISPR enzymes for expansive genome editing

Distillation of Language Model Semantics to Folded Three-Dimensional Protein Structures

People

Groups

Share this publication

Abstract

Experimental peptide targets Covid-19

Targeted intracellular degradation of SARS-CoV-2 via computationally optimized peptide fusions

RiboGen: RNA Sequence and Structure Co-Generation with Equivariant MultiFlow

Nature-inspired CRISPR enzymes for expansive genome editing