Machine Learning and Protein Optimization: Is This Where Medicine is Heading?

Sep. 15, 2022

By Ella Hohmann

From traffic predictions to virtual assistants to online customer support, machine learning algorithms have found their way into nearly every aspect of life because of their ability to make accurate predictions and decisions in a fraction of the time it would take a human.

Health care and medical research is no exception. Artificial intelligence platforms can assist with patient care, predict outcomes and diagnoses, interpret medical images and perform data review and query automation in clinical trials, among many other tasks.

Houston Methodist recently formed a group, Antibody Design and Protein Therapeutics (ADAPT), dedicated to harnessing the power of "big data" to drive new biological discoveries and enable breakthroughs in research and clinical care.

ADAPT, part of Houston Methodist's Department of Infectious Disease, boasts new recruit Raghav Shroff, Ph.D., who recently attracted national attention for creating an enzyme variant that, in a matter of hours, can break down plastics that typically take centuries to degrade.

While at The University of Texas at Austin, Shroff developed a machine learning model, MutCompute, that was used to engineer the plastic-eating enzyme. It can go through thousands of protein and mutation variations to optimize binding and identify mutations likely to have the greatest impact on improving function.

He is now using the machine at Houston Methodist to improve the speed, efficiency and reliability of the medical research performed here.

Leading Medicine sat down with Shroff to discuss how MutCompute works, what he and his ADAPT colleagues are focusing on now and why we're just at the tip of the iceberg when it comes to understanding what machine learning can do.

Q: How does MutCompute work?

Similar to how image recognition algorithms can tell you whether a picture contains a cat or a dog, I trained the program to recognize amino acids, the building blocks of proteins. Given a protein structure, it can look at an amino acid within that protein, and then dissect the chemical interactions from the atoms of surrounding amino acids. The model then looks for sites where the amino acid is the biggest "misfit" from its chemical environment and suggests mutations that might improve the function of the protein.

Q: How has it been used in the past?

We used it to develop a plastic-eating enzyme variant that breaks down PET polymers into monomers in a matter of hours. First, we used the model to optimize the enzyme so it would be suitable outside of a cellular environment and at higher temperatures. Then we put mutations into it to make it more efficient at breaking down PET. Less than 10% of plastics are actually recycled. This enzyme, which we named FAST-PETase, has huge potential to help us more effectively recycle PET, which makes up 12% of all global waste. This example goes to show how biology is at a point now where it can be used to solve problems in our everyday lives; problems that aren't even necessarily biological in nature.

Q: How do you plan on using this machine learning model at Houston Methodist?

One thing we are trying to develop here with the Houston Methodist ADAPT group is high-throughput, to-scale biology. Biology is becoming a data-rich field. We can feed this data into machines to design something tangible. Right now, we are focusing on infectious diseases.

Antibodies are proteins and MutCompute has shown pretty broad applicability with all proteins, so now we are trying to explore how it can be used in antibody development as well as vaccines.

Consider vaccines: Very rarely can you use the actual native protein of a pathogen in the vaccine; it usually needs to be stabilized in some way. With this model, we hope to make this process easier, faster and more efficient. The model spits out actionable information, which we use to reprogram and mutate a protein, test it and feed back what we learned into the machine. This creates a learning loop that we hope will broaden our understanding of how to build effective vaccines and antibodies.

Q: How fast is it?

The turnaround time in a lot of biology experiments is frustrating. It takes days before you find out whether or not an experiment was successful. Here, the part that takes the longest is teaching the machine-learning model. But once you have a good working model, you press a button and get a result in the order of minutes. It has huge potential in improving mass production and manufacturing. You no longer have to do any prerequisite labs — the machine tells you how best to mutate an antibody or whatever protein you are working with.

Q: Are there any limitations to it?

The main limitation is that you need a robust starting point. MutCompute needs to know the structure of the protein of interest. There is a large disparity between the number of sequences we know and the number of structures we know. While we have sequenced hundreds of millions of proteins, we only have around 150,000 proteins that have their structure solved. The ADAPT team is currently working on new models to bridge this gap.

Q: What are the main takeaways you would like people to know about your research?

Biology is getting to a place where it can actually be used to solve complex problems in a way we haven't seen before. We know enough now that we can actually start to engineer biology to make real-world solutions. I hope my research helps people understand the potential that machine learning has in all fields, but particularly in biology. As more and more labs start to use it and harness its power to better make sense of their data, this will become a huge area of research where we will continue to see a lot of growth.