Sign In

Efficient Generative Active Learning for Discovering Synthesizable Small-Molecule Protein Binders

Core Concepts
LAMBDAZERO, a generative active learning approach, can efficiently design novel, synthesizable small-molecule protein binders by leveraging a fast surrogate model, a generative policy with synthesizability and drug-likeness constraints, and an expensive computational oracle.
The paper introduces LAMBDAZERO, a generative active learning approach for efficiently designing novel, synthesizable small-molecule protein binders. The key components of LAMBDAZERO are: A fast surrogate model: An E(n) invariant graph neural network is used to approximate the computationally expensive molecular docking simulation, enabling efficient exploration of the vast chemical space. A generative policy with constraints: A reinforcement learning-based generative policy is trained to maximize a reward function that incorporates the surrogate model score, drug-likeness, and synthesizability. This guides the policy to generate molecules that are likely to be synthesizable and have high binding affinity. Iterative active learning: The generative policy is used to propose batches of candidate molecules, which are then evaluated using the expensive docking simulation. The results are used to update the surrogate model, and the process is repeated over multiple iterations to enrich the library of generated molecules. The authors demonstrate the effectiveness of LAMBDAZERO by applying it to the design of small-molecule inhibitors for the enzyme soluble Epoxide Hydrolase 2 (sEH). With only ~10,000 docking evaluations, LAMBDAZERO is able to generate synthesizable, drug-like molecules with docking scores that would otherwise require screening over 100 billion molecules. Experimental validation of the top-scoring LAMBDAZERO-generated molecules led to the discovery of a novel quinazoline-based scaffold with sub-micromolar sEH inhibition.
The distribution of docking scores for 5.8 million drug-like molecules from the Zinc20 dataset follows a generalized Gaussian distribution, with the tail (x > μ + 2.5σ) fitted to a generalized Gaussian distribution with β = 1.88 ± 0.18 and α = 1.38 ± 0.18.
"LAMBDAZERO can reach up to a z-score of 6.75 (-16.1) with only ~10,000 docking queries. Approximately, this would have required docking ~10^11 molecules from Zinc20." "The most potent variant, N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893), had a relatively small amide substituent, and larger substituents at this position were also relatively potent (in the micromolar range)."

Deeper Inquiries

How can the LAMBDAZERO approach be extended to incorporate additional pharmacological considerations beyond just binding affinity, such as ADMET properties

To extend the LAMBDAZERO approach to incorporate additional pharmacological considerations beyond binding affinity, such as ADMET properties, several modifications and enhancements can be implemented: Reward Function Modification: The reward function can be adjusted to include terms that account for ADMET properties. For example, incorporating penalties for molecules with poor absorption, distribution, metabolism, excretion, and toxicity profiles can guide the generative policy towards designing molecules with favorable ADMET characteristics. Surrogate Model Expansion: The surrogate model can be expanded to predict ADMET properties in addition to binding affinity. By training the model on a diverse dataset that includes ADMET data, the surrogate model can provide estimates of various pharmacokinetic and toxicological properties, enabling the generative policy to optimize for a broader range of criteria. Multi-Objective Optimization: Implementing a multi-objective optimization approach can allow the generative policy to simultaneously optimize for binding affinity, ADMET properties, and other pharmacological considerations. Techniques such as Pareto optimization can help identify molecules that achieve a balance between different objectives. Integration of External Tools: Leveraging external tools and databases that specialize in predicting ADMET properties can enhance the LAMBDAZERO framework. By integrating these tools into the workflow, the system can provide real-time feedback on the ADMET profiles of generated molecules, guiding the design process. By incorporating these enhancements, the LAMBDAZERO approach can evolve into a comprehensive platform for designing small molecules with optimized pharmacological profiles beyond just binding affinity.

What are the limitations of using molecular docking as the computational oracle, and how could the approach be improved by using higher-fidelity binding affinity prediction models

Using molecular docking as the computational oracle in the LAMBDAZERO framework has certain limitations that can impact the accuracy and efficiency of the design process: Scoring Accuracy: Molecular docking scores may not always accurately reflect the true binding affinity of a molecule. The scoring function used in docking simulations can be limited in its ability to capture the complex interactions between a ligand and a protein, leading to potential inaccuracies in predicting binding affinity. Computational Cost: Molecular docking simulations are computationally expensive, requiring significant time and resources to evaluate each candidate molecule. This can limit the scalability and speed of the design process, especially when exploring a large chemical space. To improve the approach, higher-fidelity binding affinity prediction models can be integrated into the framework. These models, such as physics-based scoring functions or machine learning-based models trained on diverse datasets, can provide more accurate estimates of binding affinity and enhance the reliability of the design process. By combining the strengths of molecular docking with the precision of advanced prediction models, the LAMBDAZERO framework can achieve higher accuracy and efficiency in designing small-molecule protein inhibitors.

Can the LAMBDAZERO framework be applied to other types of molecular design problems beyond small-molecule protein inhibitors, such as the design of materials or catalysts

The LAMBDAZERO framework can be adapted to address a wide range of molecular design problems beyond small-molecule protein inhibitors. Some potential applications include: Materials Design: By modifying the generative policy and reward function, LAMBDAZERO can be used to design novel materials with specific properties, such as conductivity, strength, or thermal stability. The surrogate model can be trained to predict material properties, and the generative policy can optimize for desired characteristics. Catalyst Design: LAMBDAZERO can be applied to the design of catalysts for chemical reactions. The framework can be tailored to generate molecules with catalytic activity, selectivity, and stability. By incorporating relevant criteria into the reward function, the generative policy can explore the chemical space to discover effective catalysts. Drug Repurposing: LAMBDAZERO can also be utilized for drug repurposing efforts, where existing molecules are redesigned for new therapeutic indications. The framework can optimize for both binding affinity to the target protein and other pharmacological properties required for the new application. By adapting the LAMBDAZERO approach to these diverse molecular design challenges, researchers can leverage its active learning capabilities to accelerate the discovery of novel molecules in various domains of chemistry and drug development.