The Meta AI team released a new protein structure prediction tool, ESMFold, claiming it to be 6x to 60x faster than its competitors AlphaFold, RoseTTAFold and others. ESMFold is trained on 15 billion parameters—one of the largest languages of models evaluated to date—giving it an advantage over other methods and competitors.
With the release of the updated Evolutionary Scale Modelling (ESM) model, the Meta AI team has set a new benchmark in protein structure prediction. Unlike their competitors relying solely on external databases of sequence alignments, the updated ESM-2 learns from the interactions between pairs of amino acids in a protein sequence, giving faster and more accurate results.
Using the ESM-2 model, researchers were able to predict the structure of one million biological sequences in less than a day. An applaudable feat!
How does ESMFold work?
The Meta AI team said that ESMFold predicts the tertiary structure of a protein based on its amino acid sequence. It processes a protein sequence, and an image of the protein’s structure materialises in its internal states. This enables atomic resolution prediction of the 3D structure, though the language model was only trained on sequences.
While the accuracy of ESMFold is similar to AlphaFold, a protein structure prediction tool by Google, ESMFold is faster. “We were able to fold a random sample of 1 million metagenomic sequences in a few hours,” claimed Meta AI researchers.
This makes mapping the structural space with billions of protein sequences with unknown structures and functions in practical timescales.
Importance of protein structure prediction
Protein is the basic building block of all living organisms. It controls most of what happens inside living cells. An understanding of how a protein works can help in creating hypotheses about how to affect it, control it, or modify it. By knowing a protein’s structure, mutations in a site-directed manner can be used to change its function and design more effective drugs.
The structure of a protein also assists in predicting its function, which molecules or drugs it can effectively bind with, and how other molecules and drugs will bind to the protein. This is of utmost importance for designing drugs.
Knowing the protein structure is essential as the proteins fold in varied manners depending on energy conformation, steric factors, temperature, pH, concentration, etc. While there are many protein structure prediction models in the market, they lack ultimate accuracy. Researchers have solved protein structures for 180,000+ proteins. However, there are billions which are yet to be solved. Much research is going on to make a change in this complex field.
Has the Protein Structure Prediction Problem been solved?
ESMFold makes it possible to predict protein structures with a higher level of accuracy at a pace faster than existing methods. This allows the model to bridge the gap between the rapid growth of protein sequence databases containing billions of sequences alongside the slower development of protein structure and function databases.
While the Meta AI team believes that ESMFold can help to understand regions of protein space that are distant from existing knowledge, we believe the journey towards the solution has begun.