{"id":29023286,"url":"https://github.com/rish-16/rna-backbone-design","last_synced_at":"2025-06-26T03:06:01.719Z","repository":{"id":244820956,"uuid":"816373425","full_name":"rish-16/rna-backbone-design","owner":"rish-16","description":"Source code for RNA-FrameFlow: SE(3) Flow Matching for 3D RNA Backbone Design","archived":false,"fork":false,"pushed_at":"2025-06-01T18:13:32.000Z","size":13699,"stargazers_count":59,"open_issues_count":1,"forks_count":10,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-02T04:42:10.990Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2406.13839","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rish-16.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-17T15:59:44.000Z","updated_at":"2025-06-01T23:31:04.000Z","dependencies_parsed_at":"2025-06-08T00:02:42.032Z","dependency_job_id":null,"html_url":"https://github.com/rish-16/rna-backbone-design","commit_stats":null,"previous_names":["rish-16/rna-backbone-design"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/rish-16/rna-backbone-design","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rish-16%2Frna-backbone-design","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rish-16%2Frna-backbone-design/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rish-16%2Frna-backbone-design/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rish-16%2Frna-backbone-design/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rish-16","download_url":"https://codeload.github.com/rish-16/rna-backbone-design/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rish-16%2Frna-backbone-design/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261990346,"owners_count":23241189,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-26T03:06:01.164Z","updated_at":"2025-06-26T03:06:01.710Z","avatar_url":"https://github.com/rish-16.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design\n\n## Description\n\nRNA-FrameFlow is a generative model for 3D RNA backbone design based on SE(3) flow matching. \n\n![RNA-FrameFlow](assets/rna_flow_diag.png)\n\n\u003e Accompanying paper: ['RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design'](https://arxiv.org/abs/2406.13839), by Rishabh Anand*, Chaitanya K. Joshi*, Alex Morehead, Arian Rokkum Jamasb, Charles Harris, Simon V Mathis, Kieran Didi, Bryan Hooi, Pietro Liò.\n\u003e - **Oral** at Machine Learning for Computational Biology (MLCB), 2024 (UW, Seattle, Washington)\n\u003e - **Oral** at ICML 2024 Structured Probabilistic Inference \u0026 Generative Modeling Workshop: [`openreview`](https://openreview.net/forum?id=Z74lflCKmF)\n\u003e - **Spotlight** at ICML 2024 AI4Science Workshop: [`openreview`](https://openreview.net/forum?id=YzjHCdZM2h)\n\n## Pipeline\n\n![RNA-FrameFlow pipeline](assets/pipeline.png)\n\n## Contents\n\n- [RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design](#rna-frameflow-flow-matching-for-de-novo-3d-rna-backbone-design)\n  - [Description](#description)\n  - [Pipeline](#pipeline)\n  - [Contents](#contents)\n  - [Installation](#installation)\n  - [Data Preparation](#data-preparation)\n  - [Training and Inference](#training-and-inference)\n    - [Download weights](#download-weights)\n    - [Using your own retrained checkpoints](#using-your-own-retrained-checkpoints)\n    - [Run inference](#run-inference)\n    - [Run evaluation](#run-evaluation)\n      - [Running `EvalSuite`](#running-evalsuite)\n  - [Acknowledgements](#acknowledgements)\n  - [Citation](#citation)\n  \n## Installation\nTo manage environments efficiently, we use [uv](https://docs.astral.sh/uv/getting-started/installation/#standalone-installer). It simplifies managing dependencies and executing scripts.\n\n\n```bash\n# clone repository\ngit clone https://github.com/rish-16/rna-backbone-design.git\ncd rna-backbone-design/\n# create python environment\npip install uv\nuv sync\n```\n\n\u003e [!CAUTION]\n\u003e Do take note of the compatibility between PyTorch and CUDA versions. We used `pytorch-cuda` v12.1 in the installation script but you should change this depending on your system's NVCC version. You can find the target version using `nvidia-smi` and looking for `CUDA Version`.\n\n## Data Preparation\n\nDownload RNASolo (3.48GB) containing the ~14K structures used to train our model at a resolution $\\leq 4.0$ in the `.pdb` file format. In the project `rna-backbone-design` repository,\n\n```bash\n# Download structures in PDB format from RNAsolo (31 October 2023 cutoff)\nmkdir -p data/rnasolo; cd data/rnasolo\ngdown https://drive.google.com/uc?id=10NidhkkJ-rkbqDwBGA_GaXs9enEBJ7iQ\ntar -zxvf RNAsolo_31102023.tar.gz\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eOlder instructions to download RNAsolo (not working – ignore this)\u003c/summary\u003e\n    \n```bash\n# create data directory\nmkdir -p data/rnasolo; cd data/rnasolo\n# download RNAsolo\nwget https://rnasolo.cs.put.poznan.pl/media/files/zipped/bunches/pdb/all_member_pdb_4_0__3_326.zip\nunzip all_member_pdb_4_0__3_326.zip # unzips all PDB files\nmv all_member_pdb_4_0__3_326.zip ../ # moves ZIP archive out of new file directory\n```\n\u003c/details\u003e\n\nWe provide a preprocessing script `process_rna_pdb_files.py` that prepares the RNA samples used during training. Again, in the main project directory,\n```bash\n# from ./\nuv run process_rna_pdb_files.py --pdb_dir data/rnasolo/ --write_dir data/rnasolo_proc/\n```\n\nWhen you visit `data/rnasolo_proc/`, you should see a bunch of subdirectories representing the root of the PDB entry names. Each subdirectory contains pickled versions of the PDB files which capture some important structural descriptors extracted from the atomic records; this is for easier bookkeeping during training. You'll also notice a `rna_metadata.csv` file. Keep track of the filepath to this CSV file – it contains metadata about the pickled RNA files and the relative filepaths to access them during training.\n\nYour directory should now look like this:\n```\n.\n├── rna_backbone_design\n│   ├── analysis\n│   ├── data\n│   ├── experiments\n│   ├── models\n│   ├── openfold\n│   └── tools\n├── configs\n│   ├── base.yaml\n│   └── inference.yaml\n├── data\n│   ├── rnasolo\n│   └── rnasolo_proc\n│       └── rna_metadata.csv\n└── camera_ready_ckpts/\n```\n\n## Training and Inference\n\n\u003e [!IMPORTANT]\n\u003e Our training relies on logging with `wandb`. Log in to WandB and make an account. Authorize WandB [here](https://wandb.ai/authorize).\n\nWe use 4 RTX3090 40GB GPUs via DDP to train our model for 120K steps, which took ~15 hours. We train on sequences of length between 40 and 150. For more specific experimental details, look at `configs/config.yaml`.\n\n```bash\n# run training\nuv run train_se3_flows.py\n```\n\nAfter training, the final saved checkpoint can be found at `ckpt/se3-fm/rna-frameflow/last.ckpt` directory saved locally (not part of this repo); this `ckpt` directory is created automatically by `wandb`. We also store intermediate checkpoints for your reference. You can rename and shift this `last.ckpt` file where necessary to run inference.\n\nAlternatively, you can use our camera-ready baseline checkpoint. The config files necessary can be found inside `camera_ready_ckpts/`.\n\n\u003e We provide a brief description of the different schemes to represent all-atom RNA molecules used throughout the codebase in `rna_backbone_design/DATA_REPR.md`. \n\n### Download weights\n\nOur strongest baseline model's trained weights are hosted on Google Drive: [link](https://drive.google.com/drive/folders/1umg0hgkBl7zsF_2GdCIKkfsRWbJNEOvp?usp=sharing). Add it into the `camera_ready_ckpts/` subdirectory. If you are on a remote server, there are libraries like [`gdown`](https://github.com/wkentaro/gdown) that help you retrieve this directly from GDrive. You need to pass in the file ID `1AnDMUa6ZnaRQonQje3Sfo1KBSQAsNKXe`:\n\n```bash\n# download checkpoints\ncd camera_ready_ckpts/\ngdown 1AnDMUa6ZnaRQonQje3Sfo1KBSQAsNKXe\n```\n\nYour subdirectory should look like this now:\n\n```\n.\n└── camera_ready_ckpts/\n    ├── inference.yaml # for inference/sampling\n    ├── rna_frameflow_public_weights.ckpt # for inference / sampling / finetuning\n    └── config.yaml # for training\n```\n\n### Using your own retrained checkpoints\n\nIf you've retrained RNA-FrameFlow from scratch, as mentioned above, your checkpoint can be found at `ckpts/se3-fm/rna-frameflow/last.ckpt`. After renaming and shifting this `last.ckpt` file where necessary, visit `configs/inference.yaml` and change the path in the inference YAML file:\n\n```\ninference:\n  ckpt_path: configs/\u003cinsert_ckpt_name\u003e.ckpt # path to model checkpoint of interest\n```\n\nThis ensures the correct model checkpoints are used to generate new samples.\n\n### Run inference\n\nBy default we sample 50 sequences per length between 40 and 150. Generated all-atom RNA backbones are stored as PDB files in the `inference.output_dir` directory listed in the inference YAML file.\n\n```bash\n# run inference\nuv run inference_se3_flows.py\n```\n\nRunning inference also performs evaluation on the generated samples to compute local and global structural metrics. Inference together with evaluation takes around 2 hours on our hardware. See the subsequent section for setting up and running evaluation separately from inference.\n\n### Run evaluation\n\n![Evaluation pipeline](assets/evaluation.png)\n\nWe provide an evaluation pipeline called [`EvalSuite`](https://github.com/rish-16/rna-backbone-design/blob/main/rna_backbone_design/analysis/evalsuite.py) that computes local and global structural metrics when pointed at a directory of our model's RNA backbone samples. We use [gRNAde](https://arxiv.org/abs/2305.14749) (Joshi et al., 2023) as our inverse folder and [RhoFold](https://arxiv.org/abs/2207.01586) (Shen et al., 2022) as the structure predictor. First, download the RhoFold checkpoints (we didn't include this because of its size):\n\n```bash\ncd rna_backbone_design/tools/rhofold_api/\nmkdir checkpoints\ncd checkpoints/\nwget https://proj.cse.cuhk.edu.hk/aihlab/RhoFold/api/download?filename=RhoFold_pretrained.pt -O RhoFold_pretrained.pt\n```\n\nIf the above URL does not work, we have stored the RhoFold checkpoints in a Google Drive for easier access:\n\n\u003e [https://drive.google.com/drive/folders/1umg0hgkBl7zsF_2GdCIKkfsRWbJNEOvp?usp=drive_link](https://drive.google.com/drive/folders/1umg0hgkBl7zsF_2GdCIKkfsRWbJNEOvp?usp=drive_link)\n\n#### Running `EvalSuite`\n\nGo back to the project's root directory. Here is a minimal example of `EvalSuite` in action. The API takes care of the computation, storage, and management of local structural measurements as well as global metrics (desigability, diversity, and novelty). Set-up instructions can be found in `inference_se3_flows.py`.\n\n```python\nfrom rna_backbone_design.analysis.evalsuite import EvalSuite\n\nrna_bb_samples_dir = \"generated_rna_bb_samples/\" # generated samples for each sequence length\nsaving_dir = \"rna_eval_metrics\" # save temp files and metrics\n\nevalsuite = EvalSuite(\n              save_dir=saving_dir,\n              paths=cfg.inference.evalsuite.paths,\n              constants=cfg.inference.evalsuite.constants,\n              gpu_id1=0, # cuda:0 -\u003e for inverse-folding model\n              gpu_id2=1,  # cuda:1 -\u003e for forward-folding model\n            )\n\n# compute local structural measurements and \nmetric_dict = evalsuite.perform_eval(\n                            rna_bb_samples_dir,\n                            flatten_dir=True\n                        )\n\nevalsuite.print_metrics(metric_dict) # print eval metrics\n\"\"\"\nDiversity (#clusters / #designable): 0.55\nNovelty (pdbTM): 0.63\nDesignability (% scTM \u003e= 0.45) 0.457\nDesignability (% scRMSD \u003c= 4 ang): 0.433\n\"\"\"\n```\n\n## Acknowledgements\n\nThis work is presented by Rishabh Anand to fulfill the Bachelor's Dissertation requirements at the Department of Computer Science, School of Computing, National University of Singapore (NUS). It is done in collaboration with Pietro Liò's group at the University of Cambridge, UK.\n\nOur codebase builds on the open-source contributions from the following projects:\n- [`protein-frame-flow`](https://github.com/microsoft/protein-frame-flow)\n- [`se3_diffusion`](https://github.com/jasonkyuyim/se3_diffusion)\n- [`MMDiff`](https://github.com/Profluent-Internships/MMDiff)\n- [`geometric-rna-design`](https://github.com/chaitjo/geometric-rna-design)\n\n## Citation\n\n```\n@article{anand2024rnaframeflow,\n  title={RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design},\n  author={Anand, Rishabh and Joshi, Chaitanya K. and Morehead, Alex and Jamasb, Arian R. and Harris, Charles and Mathis, Simon and Didi, Kieran and Hooi, Bryan and Li{\\`o}, Pietro},\n  journal={arXiv preprint arXiv:2406.13839},\n  year={2024},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frish-16%2Frna-backbone-design","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frish-16%2Frna-backbone-design","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frish-16%2Frna-backbone-design/lists"}