Project

Ophiuchus: Protein Autoencoding and Latent Diffusion from First-Principles of Symmetry

Allan dos Santos Costa

Groups

Media Lab Research Theme: Life with AI

We recently introduced Ophiuchus, a novel deep learning architecture for modeling protein structures. Ophiuchus is the first autoencoder of its kind, learning to represent all-atom proteins as coarse, compact geometric encodings.

In our work, we show how the compact and geometric Ophiuchus embeddings enable efficient protein modeling, inference and design.

Our model uses irreducible representations of SO(3) to represent protein features. This method allows the protein sequence, backbone positions and side-chain atom positions to be jointly encoded in a unified geometric embedding:

To model a global representation of the protein, we "reduce" these features and coordinates into increasingly coarser geometric embeddings, until we reach a bottleneck:

Finally, we can reverse this process and restore the protein from the bottleneck geometric representation, making an autoencoder model:

In our paper, we showed this geometric latent space to be well behaved for inference and sampling, with demonstrations of high-dimensional arithmetics through conformational latent interpolation, and of efficient generative sampling through latent denoising diffusion models. Our generative model is the first to generate large-scale protein structures at the speed of the blink of an eye.

Ophiuchus was jointly developed with the Atomic Architects group.

Stay tuned for more!