Paper review - RFDiffusion

2025.03.17

Hojae Choi <chjko0206@gmail.com>

RFDiffusion

Metrics of this paper: RFDiffusion

alt textalt text

Authors group: Baker Lab & collaborators ( MIT, etc.)

  • David Baker (receiving 1/3 the 24" Novel Prize in chemistry)

D. Baker found 'Rosetta Commons'

RFDiffusion

Backgounds - Why de novo protein design?

  • Nature has explored only a tiny subset of the possible protein landscape
  • Evolution does not necessarily select for protein attributes that are desirable from a pharmaceutical/biotechnological perspective
    • in vitro solubility, stability, ease of production, low immunogeinity etc.
  • de novo protein design allows us to derive new proteins with new functions and desirable attributes
RFDiffusion

Protein design Workflow

Backbone generation is the limiting factor
alt text

RFDiffusion

Diffusion models as an attrative framework for protein design

  • can generate endlessly diverse outputs
  • can operate directly on amino acid coordinates
  • can condition on a wide range of inputs, and can be guided with auxiliary potentials

alt text

RFDiffusion
  1. DDPM: It was used as main architecture of RFDiffusion model, generating 3D coordinate of frames.
  2. ProteinMPNN(Dauparas et al. 2022): Generating residues (amino acids) from fixed protein backbone. It was used in generating aminoacids for fixed backbone
  3. RoseTTaFold(Baek et al. 2021): Protein Folding prediction model. It was used in denoising module in RFDiffusion. It was used to prcessed by in fine-tuning
  4. Alphafold2(Jumper et al. 2021): Protein Folding prediction model. It was used in evaluation of designed proteins structures
RFDiffusion

Preview

Generating (designing) protein binder. Various applications of RFdiffusion
leftright

RFDiffusion

How can we learn on protein structures

Challenges of proteins vs Images
  • Strong geometric constraints (4 backbone heavy atoms, 3 covalent bonds, continuous chain)
  • Must also have a sequence that can encode it

alt text (Jumper et al., 2020;AF2) alt text

RFDiffusion
Challenges of proteins vs Images

How to represent a protein backbone

  • atom level : too complicated translation and rotation
    center
RFDiffusion
Challenges of proteins vs Images

RFDiffusion
Diffusion models as an attrative framework for protein design

alt text alt text

RoseTTA Fold (author's previous work) can be easily used without much changes.

RFDiffusion

Training summary

  • Dataset: PDB (384 amino acids; i.e. no cropping), clustered by sequence similairty
  • 200 time steps (t=0: true structures, t=200: 3D Gaussian coordinates, uniform frame rotations)
  • Self-conditioning of predicted structures between timesteps
  • Model was initialised from RoseTTAFold structure prediction weights
    top middle
    Main inputs: Input sequence vs. Diffused coordinates
RFDiffusion

Allowing the model to 'self-condition' improves RFdiffusion

  • In RFdiffusion, the model receives its previous prediction as a template input (‘self-conditioning’).

  • At each timestep of a trajectory (e.g. 200 steps), RFdiffusion takes from the previous step and and then predicts an updated structure ( ).

  • The next coordinate input to the model ( ) is generated by a noisy interpolation towards .

    alt text alt text

RFDiffusion

Allowing the model to 'self-condition' improves RFdiffusion

Generate B.B. find sequence with Protein MPNN recapitulate the designs with AlphaFold2

alt text

RFDiffusion

Training from pre-trained RoseTTAFold weights makes training computationally tractable

alt textalt text

RFDiffusion

Ablation study

performance: RF diffusion >> MSE Loss self-conditioning > fine-tuning > pretraining

center

RFDiffusion

Strengths

Unconditional generation

alt text

RFDiffusion

Discussion

  • How to evaluate the designed proteins
    • Directly evaluated the designed proteins with experiments
  • Is the RFdiffusion novel method?
    • my thought: More impact on the evaluatio with experiments
  • What are the other factors affect to designing performance
RFDiffusion

Conclusion

  • There are various use cases of protein design model impactful
  • Finetunning generative model with pre-trained structual prediction models (or simulations)
TIL
  • Protein can be represented by frames attributed with coordinates and rotation
RFDiffusion

Recommendations

Presentation of main authors

RFDiffusion: Accurate protein design using structure prediction and diffusion generative models, Joe Watson & David Juergens,
https://www.youtube.com/watch?v=wIHwHDt2NoI, presented at Tuesday February 14th, 4-5 pm EST, accessed at 2025.03.17
RFDiffusion

Q&A

RFDiffusion

References

  1. Watson, Joseph L., David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, et al. 2023. “De Novo Design of Protein Structure and Function with RFdiffusion.” Nature 620 (7976): 1089–1100. https://doi.org/10.1038/s41586-023-06415-8.
  2. Joe Watson & David Juergens, presented at Tuesday February 14th, 4-5 pm EST, RFDiffusion: Accurate protein design using structure prediction and diffusion generative models, https://www.youtube.com/watch?v=wIHwHDt2NoI, accessed at 2025.03.17
  3. Baek, Minkyung, Frank DiMaio, Ivan Anishchenko, Justas Dauparas, Sergey Ovchinnikov, Gyu Rie Lee, Jue Wang, et al. 2021. “Accurate Prediction of Protein Structures and Interactions Using a Three-Track Neural Network.” Science 373 (6557): 871–76. https://doi.org/10.1126/science.abj8754.
  4. Dauparas, J., I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, et al. 2022. “Robust Deep Learning–Based Protein Sequence Design Using ProteinMPNN.” Science 378 (6615): 49–56. https://doi.org/10.1126/science.add2187.
  5. Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89. https://doi.org/10.1038/s41586-021-03819-2.
RFDiffusion

Supplementary

RFDiffusion

motif scaffolding

alt text

많은 인기를 얻고 있다. 단백질 디자인 분야에 대한 수요가 큰 것 같다.

단백질 디자인이 문제가 될까? 왜 필요할까? 자연적으로 존재하는것 종류가 굉장히 적다. 원하는 특성을 가진 것이 진화론적으로 만들어질 이유가 딱히 없다. 따라서 원하는 특성을 가진 단백질은 새롭게 만들어야 한다. 새로운 기능과 특성을 가진 단백질을 만들 수 있다.

이 논문이 나올 당시, Back bone generation 을 할 수 있는 모델은 많지 않았었던것 같다. 같은 연구그룹에서 만든 Protein MPNN 은 이미 고정된 구조을 받았을 때, 적절한 sequence 를 만들어주는 모델은 있었다.었음. Diffusion 의 장점 을 살려서 만들어

## Background - Gap in the literature - Key concepts

_style: p {colums:2}

protein binder 를 생성하는 모습, backward diffusion에서의 trajectory 를 보여줌 다양한 활용 방법이 가능하다고 함.

서로 다른 modality ( alpha carbon 에 대한 translation, 과 rotation )에 대한 각각 독립적인 noise에 대해서 고려해야 한다. Random 3D Gaussian noise for initial coordinate of $C_\alpha$ frame. Uniform noise for initial frame rotations

t 스텝에서 예측된 x_0 는 다음 스텝의 input 으로 들어간다. 이 테크닉은 diffusion 관련 선행 연구에서 왔다고 하는데 출처는 정확히 모르겠음.

<video controls width="817" height="460" src="https://youtu.be/wIHwHDt2NoI?" title="YouTube video player"></video>

This video is very good to see after this presentation for more detail information.

Here, we will address the weaknesses of the paper. We will discuss the limited sample size, potential biases, and lack of longitudinal data. We will also consider the insufficient theoretical integration and suggest areas for improvement. Acknowledging these weaknesses is crucial for a balanced review.

- Open floor for questions - Clarifications on the review - Discussion on implications - Suggestions for future research - Feedback on presentation

We conclude the presentation with a Q&A session. This is an opportunity for the audience to ask questions, seek clarifications, and discuss the implications of the review. We welcome suggestions for future research and feedback on the presentation. This interactive session aims to foster a deeper understanding of the paper.