{"id":31360586,"url":"https://github.com/apple/ml-simplefold","last_synced_at":"2025-09-29T03:01:57.727Z","repository":{"id":316824165,"uuid":"1062261519","full_name":"apple/ml-simplefold","owner":"apple","description":null,"archived":false,"fork":false,"pushed_at":"2025-09-26T23:31:57.000Z","size":1242,"stargazers_count":396,"open_issues_count":10,"forks_count":17,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-09-27T00:17:39.070Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apple.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-23T03:08:49.000Z","updated_at":"2025-09-27T00:16:11.000Z","dependencies_parsed_at":"2025-09-27T00:17:43.369Z","dependency_job_id":"ae59697a-6ca6-4e40-946e-d88cb653dd84","html_url":"https://github.com/apple/ml-simplefold","commit_stats":null,"previous_names":["apple/ml-simplefold"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/apple/ml-simplefold","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-simplefold","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-simplefold/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-simplefold/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-simplefold/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apple","download_url":"https://codeload.github.com/apple/ml-simplefold/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apple%2Fml-simplefold/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277458690,"owners_count":25821322,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-29T02:00:09.175Z","response_time":84,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-27T01:58:53.223Z","updated_at":"2025-09-29T03:01:57.671Z","avatar_url":"https://github.com/apple.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\n\u003ch1 align=\"center\"\u003e\u003cstrong\u003eSimpleFold: Folding Proteins is Simpler than You Think\u003c/strong\u003e\u003c/h1\u003e\n\n\n\u003cdiv align=\"center\"\u003e\n\nThis github repository accompanies the research paper, [*SimpleFold: Folding Proteins is Simpler than You Think*](https://arxiv.org/abs/2509.18480) (Arxiv 2025).\n\n*Yuyang Wang, Jiarui Lu, Navdeep Jaitly, Joshua M. Susskind, Miguel Angel Bautista*\n\n[[`Paper`](https://arxiv.org/abs/2509.18480)]  [[`BibTex`](#citation)]\n\n\u003cimg src=\"assets/intro.png\" width=\"750\"\u003e\n\n\u003c/div\u003e\n\n\n## Introduction\n\nWe introduce SimpleFold, the first flow-matching based protein folding model that solely uses general purpose transformer layers. SimpleFold does not rely on expensive modules like triangle attention or pair representation biases, and is trained via a generative flow-matching objective. We scale SimpleFold to 3B parameters and train it on more than 8.6M distilled protein structures together with experimental PDB data. To the best of our knowledge, SimpleFold is the largest scale folding model ever developed. On standard folding benchmarks, SimpleFold-3B model achieves competitive performance compared to state-of-the-art baselines. Due to its generative training objective, SimpleFold also demonstrates strong performance in ensemble prediction. SimpleFold challenges the reliance on complex domain-specific architectures designs in folding, highlighting an alternative yet important avenue of progress in protein structure prediction.\n\n\u003c/div\u003e\n\n\n## Installation\n\nTo install `simplefold` package from github repository, run\n```\ngit clone https://github.com/apple/ml-simplefold.git\ncd ml-simplefold\npython -m pip install -U pip build; pip install -e .\npip install git+https://github.com/facebookresearch/esm.git # Optional for MLX backend\n```\n\n## Example \n\nWe provide a jupyter notebook [`sample.ipynb`](sample.ipynb) to predict protein structures from example protein sequences. \n\n## Inference\n\nOnce you have `simplefold` package installed, you can predict the protein structure from target fasta file(s) via the following command line. We provide support for both [PyTorch](https://pytorch.org/) and [MLX](https://mlx-framework.org/) (recommended for Apple hardware) backends in inference. \n```\nsimplefold \\\n    --simplefold_model simplefold_100M \\  # specify folding model in simplefold_100M/360M/700M/1.1B/1.6B/3B\n    --num_steps 500 --tau 0.01 \\        # specify inference setting\n    --nsample_per_protein 1 \\           # number of generated conformers per target\n    --plddt \\                           # output pLDDT\n    --fasta_path [FASTA_PATH] \\         # path to the target fasta directory or file\n    --output_dir [OUTPUT_DIR] \\         # path to the output directory\n    --backend [mlx, torch]              # choose from MLX and PyTorch for inference backend \n```\n\n## Evaluation\n\nWe provide predicted structures from SimpleFold of different model sizes:\n```\nhttps://ml-site.cdn-apple.com/models/simplefold/cameo22_predictions.zip # predicted structures of CAMEO22\nhttps://ml-site.cdn-apple.com/models/simplefold/casp14_predictions.zip  # predicted structures of CASP14\nhttps://ml-site.cdn-apple.com/models/simplefold/apo_predictions.zip     # predicted structures of Apo\nhttps://ml-site.cdn-apple.com/models/simplefold/codnas_predictions.zip  # predicted structures of Fold-switch (CoDNaS)\n```\nWe use the docker image of [openstructure](https://git.scicore.unibas.ch/schwede/openstructure/) 2.9.1 to evaluate generated structures for folding tasks (i.e., CASP14/CAMEO22). Once having the docker image enabled, you can run evaluation via:\n```\npython src/simplefold/evaluation/analyze_folding.py \\\n    --data_dir [PATH_TO_TARGET_MMCIF] \\\n    --sample_dir [PATH_TO_PREDICTED_MMCIF] \\\n    --out_dir [PATH_TO_OUTPUT] \\\n    --max-workers [NUMBER_OF_WORKERS]\n```\nTo evaluate results of two-state prediction (i.e., Apo/CoDNaS), one need to compile the [TMsore](https://zhanggroup.org/TM-score/TMscore.cpp) and then run evaluation via:\n```\npython src/simplefold/evaluation/analyze_two_state.py \\ \n    --data_dir [PATH_TO_TARGET_DATA_DIRECTORY] \\\n    --sample_dir [PATH_TO_PREDICTED_PDB] \\\n    --tm_bin [PATH_TO_TMscore_BINARY] \\\n    --task apo \\ # choose from apo and codnas\n    --nsample 5\n```\n\n## Train\n\nYou can also train or tune SimpleFold on your end. Instructions below include details for SimpleFold training. \n\n### Data preparation\n\n#### Training targets\n\nSimpleFold is trained on joint datasets including experimental structures from [PDB](https://www.rcsb.org/), as well as distilled predictions from [AFDB SwissProt](https://alphafold.ebi.ac.uk/download#swissprot-section) and [AFESM](https://afesm.foldseek.com/). Target lists of filtered SwissProt and AFESM targets thta are used in our training can be found:\n```\nhttps://ml-site.cdn-apple.com/models/simplefold/swissprot_list.csv # list of filted SwissProt (~270K targets)\nhttps://ml-site.cdn-apple.com/models/simplefold/afesm_list.csv # list of filted AFESM targets (~1.9M targets)\nhttps://ml-site.cdn-apple.com/models/simplefold/afesme_dict.json # list of filted extended AFESM (AFESM-E) (~8.6M targets)\n```\nIn `afesme_dict.json`, the data is stored in the following structure:\n```\n{\n    cluster 1 ID: {\"members\": [protein 1 ID, protein 2 ID, ...]},\n    cluster 2 ID: {\"members\": [protein 1 ID, protein 2 ID, ...]},\n    ...\n}\n```\n\nOf course, one can use own customized datasets to train or tune SimpleFold models. Instructions below list how to process the dataset for SimpleFold training. \n\n#### Process mmcif structures\n\nTo process downloaded mmcif files, you need [Redis](https://redis.io/docs/latest/operate/oss_and_stack/install/archive/install-redis/) installed and launch the Redis server:\n```\nwget https://boltz1.s3.us-east-2.amazonaws.com/ccd.rdb\nredis-server --dbfilename ccd.rdb --port 7777\n```\nYou can then process mmcif files to input format for SimpleFold:\n```\npython src/simplefold/process_mmcif.py \\\n    --data_dir [MMCIF_DIR]   # directory of mmcif files\n    --out_dir [OUTPUT_DIR]   # directory of processed targets\n    --use-assembly\n```\n\n### Training\n\nThe configuration of model is based on [`Hydra`](https://hydra.cc/docs/intro/). An example training configuration can be found in `configs/experiment/train`. To change dataset and model settings, one can refer to config files in `configs/data` and `configs/model`. To initiate SimpleFold training:\n```\npython train experiment=train\n```\nTo train SimpleFold with FSDP strategy:\n```\npython train_fsdp.py experiment=train_fsdp\n```\n\n## Citation\nIf you found this code useful, please cite the following paper:\n```\n@article{simplefold,\n  title={SimpleFold: Folding Proteins is Simpler than You Think},\n  author={Wang, Yuyang and Lu, Jiarui and Jaitly, Navdeep and Susskind, Josh and Bautista, Miguel Angel},\n  journal={arXiv preprint arXiv:2509.18480},\n  year={2025}\n}\n```\n\n## Acknowledgements\nOur codebase is built using multiple opensource contributions, please see [ACKNOWLEDGEMENTS](ACKNOWLEDGEMENTS) for more details. \n\n## License\nPlease check out the repository [LICENSE](LICENSE) before using the provided code and\n[LICENSE_MODEL](LICENSE_MODEL) for the released models.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-simplefold","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapple%2Fml-simplefold","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapple%2Fml-simplefold/lists"}