So in the last post I focused on docking scoring functions and their limitations. But of docking scoring functions are not food enough, there must be some sort of different scoring method that one could apply to rank the docking pose more accurately. This concept Is known as rescoring.
Rescoring methods aim to refine docking predictions by using more accurate, computationally intensive approaches. This post outlines the main classes of rescoring techniques, evaluates their strengths and weaknesses, and provides guidance on when and how to apply them effectively.
Rescoring Methods: An Overview
1. Empirical scoring functions (e.g., ChemScore, X-Score): Fast and interpretable, but limited by oversimplified energy terms and training set bias.
2. Force-field based methods: these include MM-GBSA (Molecular Mechanics Generalized Born Surface Area) and MM-PBSA (Molecular Mechanics Poisson-Boltzmann Surface Area). Both incorporate molecular mechanics energies with continuum solvation models. They require minimization or short MD simulations to estimate binding energies more accurately.
3. Consensus scoring: Combines multiple scoring functions (empirical, knowledge-based, etc.) to reduce individual model bias. Often improves early enrichment but is not always reliable.
4. Interaction-based filters: Methods that evaluate pose quality based on H-bonding, ionic interactions, desolvation penalties, and hydrophobic contacts (e.g., PoseBuster, IChem).
5. Machine learning models (e.g., RF-Score, CNN-based models like Gnina): Can capture non-linear relationships but require careful validation and are highly dependent on training data.
6. Quantum mechanical rescoring (e.g., DFT, GFN-FF): Very accurate but extremely slow, generally impractical for more than a few poses.
When is Rescoring Useful?
Rescoring is particularly beneficial in these scenarios:
- The initial docking scoring function fails to distinguish between plausible and implausible poses.
- The receptor is rigid and the ligand is moderately flexible.
- Retrospective enrichment is poor (ligands not ranking above decoys).
- You have a manageable number of poses (~10-100 per target).
Rescoring is less useful:
- For ultra-large libraries (>1M compounds), due to prohibitive computational cost.
- When the docking protocol already uses high-accuracy scoring (e.g., Glide XP, CovDock).
- If the binding mode prediction is uncertain; rescoring can't fix a bad pose.
Notably, Sindt et al. (2025) underscore the difficulty of rescoring large-scale docking outputs in ultra-large libraries. They show that despite theoretical improvements, rescoring methods such as MM-GBSA may not effectively reorder top-ranked molecules when the docking protocol has already introduced significant bias. Their findings suggest that the utility of rescoring is context-dependent and that blindly applying more complex scoring may not yield better results without rigorous pose validation.
Pros and Cons of Rescoring
Pros:
- Improves enrichment in retrospective and prospective studies.
- Can recover false negatives missed by docking.
- Offers more physically realistic estimates of binding.
Cons:
- Computationally expensive.
- May introduce false positives due to overfitting or inaccuracies in force fields.
- Limited by accuracy of the initial pose.
As shown by Sindt et al., rescoring cannot overcome limitations of initial pose sampling in large-scale virtual screens, and may result in poor enrichment if used naively.
MD-based Rescoring: MM-GBSA and MM-PBSA
These methods can be applied after energy minimization or from snapshots of MD trajectories:
- Minimization-only is often sufficient, especially for rigid binding pockets or small ligands.
- Short MD simulations (1-10 ns) can help explore local flexibility and better estimate solvation effects.
- Long MD (>10 ns) offers marginal benefit for these endpoint methods but is essential for more complex free energy approaches like FEP.
MD-based rescoring also enables assessment of pose stability and interaction persistence over time, which is especially useful in ambiguous binding scenarios.
Rescoring in Virtual Screening vs. Single Ligand Optimization
In virtual screening, rescoring is typically applied to the top 1–5% of hits from docking, due to cost.
In single-ligand studies, more exhaustive rescoring (even QM-based) is feasible and recommended, especially for lead optimization.
Different strategies make sense depending on the goal: filtering out decoys vs. refining a pose.
As emphasized by Sindt et al., rescoring should be coupled with pose quality control and chemical plausibility filters, particularly in ultra-large screens where docking noise dominates.
Conclusion
Rescoring is a valuable tool in the docking pipeline but is not a panacea. It must be applied judiciously, with clear understanding of its limitations and appropriate benchmarks. Studies like Sindt et al. (2025) caution against over-reliance on post hoc scoring without validation, especially in large-scale campaigns.
No comments:
Post a Comment