{"id":13994020,"url":"https://github.com/michiyasunaga/DrRepair","last_synced_at":"2025-07-22T18:32:59.405Z","repository":{"id":37330621,"uuid":"275260076","full_name":"michiyasunaga/DrRepair","owner":"michiyasunaga","description":"[ICML 2020] DrRepair: Learning to Repair Programs from Error Messages","archived":false,"fork":false,"pushed_at":"2021-05-24T16:03:45.000Z","size":1860,"stargazers_count":194,"open_issues_count":10,"forks_count":32,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-08T04:51:18.973Z","etag":null,"topics":["code-generation","deep-learning","graph-neural-networks","pre-training","program-repair"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2005.10636","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michiyasunaga.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-06-26T22:40:02.000Z","updated_at":"2025-04-05T16:24:12.000Z","dependencies_parsed_at":"2022-09-12T17:02:01.912Z","dependency_job_id":null,"html_url":"https://github.com/michiyasunaga/DrRepair","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/michiyasunaga/DrRepair","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michiyasunaga%2FDrRepair","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michiyasunaga%2FDrRepair/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michiyasunaga%2FDrRepair/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michiyasunaga%2FDrRepair/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michiyasunaga","download_url":"https://codeload.github.com/michiyasunaga/DrRepair/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michiyasunaga%2FDrRepair/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266552517,"owners_count":23947174,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-22T02:00:09.085Z","response_time":66,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-generation","deep-learning","graph-neural-networks","pre-training","program-repair"],"created_at":"2024-08-09T14:02:40.265Z","updated_at":"2025-07-22T18:32:54.390Z","avatar_url":"https://github.com/michiyasunaga.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DrRepair: Learning to Repair Programs from Error Messages\n\nThis repo provides the source code \u0026 data of our paper: [Graph-based, Self-Supervised Program Repair from Diagnostic Feedback](https://arxiv.org/abs/2005.10636) (ICML 2020).\n\u003c!-- [[Paper (ICML2020)](https://arxiv.org/abs/2005.10636)] --\u003e\n```\n@InProceedings{Yasunaga20DrRepair,\n  author =  {Michihiro Yasunaga and Percy Liang},\n  title =   {Graph-based, Self-Supervised Program Repair from Diagnostic Feedback},\n  year =    {2020},  \n  booktitle =   {International Conference on Machine Learning (ICML)},  \n}\n```\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./DrRepair-overview.png\" width=\"600\" title=\"Learning to Repair Programs from Error Messages\" alt=\"\"\u003e\n\u003c/p\u003e\n\n## Dependencies\n* GCC: Follow the SPoC requirement (https://github.com/Sumith1896/spoc)\n* Python 3.6.8 (e.g. `conda create -n DrRepair python=3.6.8`)\n* Python libraries\n  - torch==1.0.1, numpy, tqdm, regex, joblib, pyyaml, bottle, cheroot, tensorboardX\n  - clang==8.0.1 (do the following)\n      ```\n      conda config --add channels conda-forge\n      conda install python-clang==8.0.1\n      ```\n\n## Data\nDownload all the raw data -- DeepFix, SPoC, codeforce (for pretraining) -- by\n```\n./download_raw_data.sh\n```\n\nYou can preprocess the raw data to get the **program repair** data by running the commands in\n```\ndata/1.run-gen-err-dataset--orig-spoc.sh\ndata/2.run-gen-err-dataset--auto-corrupt--spoc.sh\ndata/3.run-gen-err-dataset--auto-corrupt--deepfix.sh\n```\nHowever, this takes a significant time, so for your convenience, you can download all the preprocessed data by\n```\n./download_preprocessed_data.sh\n```\n\n\nThe repo structure looks like the following:\n```plain\n.\n└─ raw_data/\n   ├── codeforce_data/                  (raw programs from codeforce)\n   ├── deepfix_data/                    (raw programs from deepfix)\n   └── spoc_data/\n       ├── spoc                              (SPoC data release)\n       └── translation_preds                 (line-level code predictions from Kulal+19)\n\n└─ data/                             \n   ├── *.sh, *.py                       (preprocessing scripts)\n   ├── err-data-compiler--orig-spoc/    (preprocessed, program repair data for spoc)\n   ├── err-dev-compiler--for-SPoC/      (└─ dev data for spoc)\n   ├── err-vocab-compiler--for-SPoC/    (└─ vocab for spoc)\n   ...\n   ... [similarly for deepfix and pre-training]\n\n└─ utils/                      (utilities for code processing)\n\n└─ model/                      (DrRepair model)\n\n└─ evaluation/                 (to evaluate Repair model on deepfix/spoc test)\n   ├── deepfix\n   └── spoc\n       ├── translation_preds_test/           (line-level code predictions from Kulal+19 for TestP/TestW)\n       ...\n```\n\n\n## Train models\nLet's train program repair models.\nFirst, go to `model` directory.\nThen, run commands listed in `run_deepfix.sh` or `run_spoc.sh`.\nFor example, if we train DrRepair (\"base + graph\" in the paper) on the DeepFix data, run:\n```\nname=\"code-compiler--2l-graph\"\nmkdir -p out_deepfix/${name}\npython3 -u main_deepfix.py -o ${name} train \\\n    configs/base.yml  configs/data-deepfix/err-data-orig.yml \\\n    configs/model-code-compiler/2l-graph--dec-attn-all.yml\n```\n\n\n## Evaluate models\nWe run the trained program repair model as a server.\nWe then call this model on application tasks (DeepFix and SPoC) to evaluate the usefulness of the model.\n\n### DeepFix\n#### 1. Start server\nFirst, go to `model` directory.\nWe run a trained model (e.g. code-compiler--2l-graph) as a server by\n```\nname=\"SERVER--code-compiler--2l-graph\"\nmkdir out_deepfix/${name}\npython3 -u main_deepfix.py -o ${name} server -p \u003cport\u003e \\\n    -l out_deepfix/code-compiler--2l-graph/\u003ccheckpoint\u003e \\\n    configs/base.yml  configs/data-deepfix/err-data-orig.yml \\\n    configs/model-code-compiler/2l-graph--dec-attn-all.yml\n```\nFor `\u003cport\u003e`, pick a port number (e.g. 8080) for the server.\nFor `\u003ccheckpoint\u003e`, pick a checkpoint (e.g. 150000) of the trained model.\nThen run ```ifconfig``` to get the IP address (e.g. 172.24.67.161) of the machine hosting this model.\nConcrete examples are provided in the second half of `model/run_deepfix.sh`.\n\n#### 2. Run model on DeepFix test\nGo to `evaluation/deepfix` directory. First prepare:\n```\nrepo_root=\"../../../..\"\nprogram_data_root=${repo_root}\"/raw_data/deepfix_data\"\ntest_split_root=${repo_root}\"/data/err-data-compiler--auto-corrupt--orig-deepfix/bin4\"\n```\nTo run the trained model on the DeepFix test examples, do\n```\nname=\"code-compiler--2l-graph\"\nmkdir -p out/${name}/log\ncd out/${name}\n\nfor entry in ${test_split_root}/*\ndo\n  probid=`basename $entry`\n  python3 -u ../../test_deepfix.py \\\n  --input-code-dir ${program_data_root}/${probid}/erroneous \\\n  --repairer-server  http://\u003cIP\u003e:\u003cport\u003e/pred\ndone\n```\nwhere you plug the IP address and port number into `\u003cIP\u003e` and `\u003cport\u003e`.\nAfter this completes, you can get the test accuracy by\n```\npython3 -u ../../collate_deepfix.py\n```\nConcrete examples are provided in `evaluation/run_test_deepfix.sh`.\n\n\n\n### SPoC\n#### 1. Start server\nFirst, go to `model` directory.\nWe run a trained model (e.g. code-compiler--2l-graph--finetune) as a server by\n```\nname=\"SERVER--code-compiler--2l-graph--finetune\"\nmkdir out_spoc/${name}\npython3 -u main_spoc.py -o ${name} server -p \u003cport\u003e \\\n    -l out_spoc/code-compiler--2l-graph--finetune/\u003ccheckpoint\u003e \\\n    configs/base.yml  configs/data-spoc/err-data-orig.yml \\\n    configs/model-code-compiler/2l-graph--dec-attn-all.yml\n```\nSimilar to DeepFix, pick a port number and a checkpoint, and get the IP address.\nConcrete examples are provided in the second half of `model/run_spoc.sh`.\n\n#### 2. Run model on SPoC test\nGo to `evaluation/spoc` directory. First prepare:\n```\nrepo_root=\"../../../..\"\n```\nTo run the trained model on all the programs in SPoC TestW, do\n```\nname=\"code-compiler--2l-graph--finetune\"\n\nINPUT=translation_preds_test/testw    #change to testp if you want to evaluate on testp\nN=$(tail -n+2 ${INPUT}.tsv | cut -f 3-6 | uniq | wc -l)  # Count the number of programs\ninterval=10\n\nmkdir -p out_testw/${name}/log        #change to testp if you want to evaluate on testp\ncd out_testw/${name}                  #change to testp if you want to evaluate on testp\n\ni=1\nwhile [[ $i -le $N ]]; do\n  python -u ../../test_spoc.py -p 100 \\\n  --compile-budget 100 --n-parallel ${interval} \\\n  --repairer-server  http://\u003cIP\u003e:\u003cport\u003e/pred \\\n  ../../${INPUT} $i\n  i=$(($i + ${interval}))\ndone\n```\nwhere you plug the IP address and port number into `\u003cIP\u003e` and `\u003cport\u003e`.\nAfter this completes, you can get the test accuracy by\n```\npython3 -u ../../collate_spoc.py\n```\nConcrete examples are provided in `evaluation/run_test_spoc.sh`.\n\n\n\n\n\n## Acknowledgment\nThe original DeepFix and SPoC data used in this work come from the following papers:\n```\nDeepFix: Fixing common C language errors by deep learning. Rahul Gupta, Soham Pal, Aditya Kanade, Shirish Shevade. AAAI 2017.\nSPoC: Search-based Pseudocode to Code. Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken and Percy Liang. NeurIPS 2019.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichiyasunaga%2FDrRepair","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichiyasunaga%2FDrRepair","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichiyasunaga%2FDrRepair/lists"}