“Access to data is one of the biggest challenges”

Data-driven techniques such as machine learning offer a wide range of opportunities for the classical discovery-development cycle, including molecular optimization and synthesis planning. Put in other words:

“Machine learning could totally change the way we develop drugs today”, says Klavs F. Jensen.

In January, he spoke on the topic at BioInnovation Institute when he visited from the Massachusetts Institute of Technology where he is a Warren K. Lewis Professor in Chemical Engineering, Materials Science and Engineering. From 2007 – 2015, he was Head of the Department of Chemical Engineering.

What can machine learning do for drug development?
It can help companies scale more quickly and develop drugs more rapidly. Scientists in commercial labs spend a lot of time running the same synthesis again and again and that is tedious. Machines are good at doing routine tasks so humans can focus on being creative. If we can use machine learning to propose ways to synthesize a molecule and automated robot enabled systems to do the work, it can help companies get out of the early stages much faster.

What is the biggest challenge?
Access to data is one of the biggest challenges. For machine learning to be useful, we need lots of accurate data, and we need it in a form that can easily be analyzed. Chemical data has many different components. We need to record what reaction is running, which molecule is used, which reagents and catalysts are used, at what temperature it is run and much more.

We already have some data, but it is unorganized and displayed in many different ways such as through integrated text and graphs, which makes information extraction challenging.

How do we solve this problem?
My colleague Regina Barzilay, Professor at MIT Computer Science and Artificial Intelligence Lab is working on how to use machine learning to read chemistry literature. Going forward we could think about how to create new ways of reporting the results of our research, but the journals would need to agree to the format and the scientists would need to be willing to do the work. Another dilemma is that companies may not be interested in giving up research results because current drugs are based on these results. I do believe there is hope, as we have seen examples of companies in EU projects share a subset of their results anonymously through third party to train a shared model. Our ability to perform thousands of chemistry experiments is rapidly increasing so another solution could eventually be to just generate a lot of new data rather than struggling to use historical data.

What is your vision for machine learning in drug development?
In the near future, we will need a wider range of drugs as we will treat diseases with medicines that are based on the patient’s genetic profile. Data science tools have many promising applications including using data from biological screens and clinical outcomes to improve drug design to propose new molecules with desired clinical and physical properties.

More news from BII
BII launches entrepreneurial growth environment for university spin-outs
Solving climate change with synthetic biology

Read our impact report and get acquainted with the progress that BII experienced during 2022

Impact Report 2022