본 논문은 molecular docking을 generative problem으로 간주한 첫 연구이자, 굉장히 흥미로운 결과들을 보여주었고, 성능 향상을 달성했다.
Diffusion과 molecular docking에 관심이 있다면 꼭 한 번 읽어봐야 할 논문이다.
가능하다면 저자들이 본 논문에 대해 설명한 youtube 영상도 보면 이해에 도움이 된다.

Summary

Molecular docking을 regression이 아닌, generative problem으로 formulation
- Target protein 구조 $\mathbf{y}$를 조건으로 하여, ligand의 위치와 구조$\mathbf{x}$의 distribution을 학습 → $p(\mathbf{x} | \mathbf{y})$
Generation에 diffusion process 사용
두 개의 구분된 모델 사용
- Score model: $s(\mathbf{x}, \mathbf{y}, t)$
  
  Ligand, protein, timestep을 input으로 하여 score를 prediction하는 model
- Confidence model: $d(\mathbf{x}, \mathbf{y})$
  
  Ligand와 protein의 구조를 바탕으로 ground truth ligand pose 대비 RMSD 기준 2Å 이하의 pose일 확률 return
Diffusion on Product space $\mathbb{P}$
- Degree of freedom을 $3n$에서 $(m+6)$로 축소하여 diffusion 수행

Preliminaries

Definition:

Target protein에 결합한 ligand의 position, orientation, conformation을 예측하는 것을 의미
Two types of tasks 크게 두 종류의 task로 구분함
- Known-pocket docking
  - Protein의 binding pocket의 위치 정보를 주고 예측
- Blind docking
  - Binding pocket에 대한 정보 없이 예측하는 더 일반적이고 어려운 상황

Standard evaluation metric:
- $\mathcal{L}_{\epsilon} = \sum_{x, y} I_{\text{RMSD}(y, \hat{y}(x))<\epsilon}$: $\text{RMSD} < \epsilon$ 으로 예측한 것들의 비율 → 미분 불가능함.
- 대신, $\text{argmin}_{\hat{y}} \lim_{\epsilon \rightarrow 0} \mathcal{L}_\epsilon$ 을 objective function 으로 사용함.
Regression은 unimodal일 경우에만 이 objective function과 일맥 상통함.
그러나 Docking은 상당한 aleatoric (irreducible) & epistemic (reducible) uncertainty 지님.
- Regression 방법들은 $\sum \|y - \hat{y}\|^2_2$ 를 최소화하는 것이 목적이고, 이는 여러 mode들의 weighted mean을 예측하도록 학습할 것.
- 반면 generative model은 대부분의 mode에 대해 generation 가능.

두 단계 model 사용
- Score model: Translation (평행 이동), rotation (회전), torsion (비틀림)에 대해 reverse diffusion 수행
- Confidence model: 각 ligand pose가 ground truth 와 비교 시 $\text{RMSD} < 2\text{Å}$ 차이로 잘 위치했는지 binary prediction.

Generative model은 여러 pose를 sampling 할 수 있지만, 대부분 연구자들은 1개나 몇 개 정도의 sample만 보기를 원한다.
따라서 confidence prediction은 downstream task에 있어 유용하게 사용될 수 있다.
Confidence model $d(\mathbf{x}, \mathbf{y})$
- $\mathbf{x}$: Ligand의 pose
- $\mathbf{y}$: Target protein의 구조
Sample들은 confidence score로 순위가 매겨지고, 가장 좋은 순위의 confidence score가 사용된다.
Training & Inference
- 학습된 diffusion score model에 대해 후보 pose들을 sampling 하고, binary label (각 pose가 ground truth pose 대비 $\text{RMSD} < 2\text{Å}$인지 아닌지)을 만든다.
- 이후 confidence model은 binary label을 맞추도록 cross entropy loss로 학습된다.
- Inference 시에는 diffusion score model이 $N$개의 pose를 생성하고, confidence model에 그 pose가 전달되어 confidence score 기반으로 순위를 매겨 사용한다.

Standard benchmark: PDBBind 19k experimentally determined structures of small molecules + proteins

Baselines: search-based & deep learning
Prediction correctness Outperform search-based, deep learning, and pocket prediction + search-based methods
Runtime 3 times faster than the most accurate baseline
Generalization to unseen receptors Able to generalize: outperform classical method
Reverse diffusion process GIF
Confidence score quality High selective accuracy: valuable information for practitioners

Molecular docking을 generation problem으로 formulation 했다는 점이 아주 인상 깊다.
Conformer generation의 condition으로 protein structure를 사용했다.
하지만 two step approach라는 점, 그리고 confidence model의 input이 predicted ligand pose $\hat{\mathbf{x}}$와 protein structure $\mathbf{y}$ 인데, 결국 예측하는 것은 “predicted ligand pose $\hat{\mathbf{x}}$와 ground truth ligand pose $\mathbf{x}$ 간의 차이가 2Å 이하인지” 라서 약간의 괴리감이 있는 것 같다.
아직 성능 개선의 room 이 상당히 남아있지만, GPU workload가 꽤 많이 들어서 우리 연구실에서 수행하기에는 어려울 수도 있을 것 같다는 생각이 든다.
Docking 이지만 physics 기반의 inductive bias가 들어가 있지는 않은 것 같아서 얼마나 generalization 하기에 적합한지 의문이다.