All chemists should be using AI-powered molecular docking Here’s how

DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking, Corso, Gabriele and Stärk, Hannes and Jing, Bowen and Barzilay, Regina and Jaakkola, Tommi, arXiv:2210.01776 GitHub

Major companies have been implementing AI-driven drug discovery tools for years now. Nature reveals Insilico Medicine as the pack leader of this revolution, progressing from no leads in 2020 to 31 therapeutic programs in 2021. To this day, over 160 AI-driven drug discovery programs have been announced.

 

However, what’s even more astounding is that these AI drug discovery tools are freely available online! This is thanks to NVIDIA’s open source BioNeMo service. As a medicinal chemist who utilises docking software, I highly recommend giving AI docking a try to help with justification of drug discovery papers/theses.

 

Today I’ll be explaining how a chemist with no coding experience can operate the AI docking program DiffDock. I will also compare my results to those of a conventional *paid* docking software (Discovery Studio).

 

Before diving into data, here’s some context:

 

Diffdock’s binding success rate of 38% on PDBBind provides a significant improvement on traditional docking software (23%) and previous AI docking programs (20%) utilising regression based binding predictions.

This software also boasts a 3x faster docking runtime compared to other leading software, and can potentially be run online without processing requirements from your computer. The designers of this program have also implemented a new ‘confidence method’ for ranking the best molecule poses. For more information on performance check out the seminar recorded by the developers Corso et. al: DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking.

 

 

How you can use it (Easily and quickly)

 As a chemist with no coding knowledge, I’ve resorted to using online code sharing platforms. This turned out to be surprisingly easy! The most simple method is from Simon Duerr HuggingFace, where you can simply carry out docking by entering the protein PDB code and the SMILES code of your molecule! There is also a Google Colab notebook to run DiffDock by Brian Naughton.

(Note: increasing the inference steps to 40 increases the runtime but doesn’t seem to do much)

Within 1-2 minutes, the program gives you a diffusion simulation that looks something like this:

Analysing the data can be done for free aswell by downloading a free visualiser such as pymol or Discovery Studio (visualiser only).

 

 

 

Results and discussion

 

 

To test Diffdock’s accuracy, I modelled the binding of known inhibitors of the CDK7 kinase, comparing them with Discovery Studio Docking Gold’s results.

What it does well:

 

I was able to bind the inhibitor Samuraciclib to protein 7B5O, and it did so quite accurately, reporting a ‘confidence value’ of 0.6. I found this confidence feedback very useful: usually a positive value is reliable, and a negative one means you shouldn’t trust the simulation.

Visualised on Discovery Studio

Problems I have experienced:

When modifying the ligand with some of my own compounds, the confidence of the binding results was always negative, and side chains would stick out of the protein. *At least it can assess if its predictions aren’t accurate*

 

This problem was mentioned in the seminar, and is due to the program only modelling the main ligand backbone and then dealing with the sidechains later. Corso et. al. stated they are working on improving this.

 

Therefore, this software may be a good model for structure activity optimisation of a protein with an existing protein-bound crystal structure, but was not very useful to me for simulating the binding of significantly different compounds.

 

I attempted to modify the pdb protein to remove waters and specify the binding site better, but unfortunately the program returned could not read them. (If anyone knows how to do this please let me know)

I was also unable to run more than one compound at once, or increase the number of poses past what seems to be 20… This can certainly be done by downloading Diffdock and commanding it with your own code… but that is beyond my free-time working capacity.

 

Still, I would definitely suggest you give this a try if you’re a medicinal chemist, you can get binding data much easier and faster than using a paid docking software. Furthermore, Diffdock’s confidence value will indicate whether the data is reliable or not, which is not a common feature in other programs.

I couldn’t find any non-technical explanation for how to run this program simply. Hopefully I saved you some time!

 

Let me know if you have any questions or suggestions!

Previous
Previous

Nature’s hackers: 5 animals that cheat evolution for their own gain

Next
Next

This AI-powered bracelet that makes you 'feel' sound will be responsible for changing our world