{"id":20859376,"url":"https://github.com/engineeringsoftware/codeditor","last_synced_at":"2025-05-12T08:32:23.555Z","repository":{"id":210775866,"uuid":"685646611","full_name":"EngineeringSoftware/codeditor","owner":"EngineeringSoftware","description":"Multilingual Code Co-Evolution Using Large Language Models","archived":false,"fork":false,"pushed_at":"2024-05-27T20:07:40.000Z","size":55,"stargazers_count":11,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"public","last_synced_at":"2024-05-28T05:47:55.422Z","etag":null,"topics":["co-evolution","code","evolution","large-language-models","llm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EngineeringSoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-31T17:32:19.000Z","updated_at":"2024-05-27T20:07:43.000Z","dependencies_parsed_at":"2023-12-04T22:25:39.486Z","dependency_job_id":"c907c93f-c54b-48f9-ac62-2c4a426f37e6","html_url":"https://github.com/EngineeringSoftware/codeditor","commit_stats":null,"previous_names":["engineeringsoftware/codeditor"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2Fcodeditor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2Fcodeditor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2Fcodeditor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EngineeringSoftware%2Fcodeditor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EngineeringSoftware","download_url":"https://codeload.github.com/EngineeringSoftware/codeditor/tar.gz/refs/heads/public","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225130723,"owners_count":17425506,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["co-evolution","code","evolution","large-language-models","llm"],"created_at":"2024-11-18T04:49:39.726Z","updated_at":"2024-11-18T04:49:40.372Z","avatar_url":"https://github.com/EngineeringSoftware.png","language":"Python","readme":"# Multilingual Code Co-Evolution Using Large Language Models\n\nThis repo hosts the code and data for the following FSE 2023 paper:\n\nTitle: [Multilingual Code Co-Evolution Using Large Language Models](https://arxiv.org/abs/2307.14991)\n\nAuthors: [Jiyang Zhang](https://jiyangzhang.github.io/), [Pengyu Nie](https://pengyunie.github.io/), [Junyi Jessy Li](https://jessyli.com/), [Milos Gligoric](http://users.ece.utexas.edu/~gligoric/)\n\n```bibtex\n@inproceedings{ZhangETAL23Codeditor,\n  author = {Zhang, Jiyang and Nie, Pengyu and Li, Junyi Jessy and Gligoric, Milos},\n  title = {Multilingual Code Co-Evolution Using Large Language Models},\n  booktitle = {Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering},\n  year = {2023},\n}\n```\n\n## News\nMay 2024\nThe fine-tuned EditsTranslation model is released on 🤗 ! 🔥[cs2java](https://huggingface.co/EngineeringSoftware/EditsTranslation-cs2java) and [java2cs](https://huggingface.co/EngineeringSoftware/EditsTranlation-java2cs/settings) \n\n## How to Use\n\n[sec-howto]: #how-to-use\n\n```python\nfrom transformers import T5ForConditionalGeneration, AutoTokenizer\n\ncheckpoint = \"EngineeringSoftware/EditsTranlation-java2cs\"\n\ntokenizer = AutoTokenizer.from_pretrained(checkpoint)\nmodel = T5ForConditionalGeneration.from_pretrained(checkpoint)\n\ncode_input = \"\"\"class HelloWorld { public static void main(String[] args) { System.out.println(\"Hello, World!\")\"\"\"\n\ninput_ids = tokenizer(code_input, return_tensors=\"pt\").input_ids\ngenerated_ids = model.generate(input_ids, max_length=200)\nprint(tokenizer.decode(generated_ids[0], skip_special_tokens=True))\n# output: \u003cINSERT\u003e; } } ;\u003cINSERT_END\u003e class HelloWorld { public static void main(String[] args) { System.out.println(\"Hello, World!\") ; } } ;\n```\n\n\n\n\n## Introduction\n\nThis repo contains the code and artifacts for reproducing the experiments in [Multilingual Code Co-Evolution Using Large Language Models](https://arxiv.org/abs/2307.14991).\nIn this work, we introduce Codeditor for co-evolving software implemented in multiple programming languages.\n\nThe code includes:\n\n- scripts for processing dataset\n- scripts for training and evaluating codeditor models\n\nThe artifacts include:\n\n- Java to C# raw paired changes\n- Java to C# translation dataset processed for codeditor models\n\n## Data Downloads\n\n[sec-downloads]: #data-downloads\n\nAll our data is hosted on UTBox via [a shared folder](https://utexas.box.com/s/iwcvwgx23g9xvowu9joa661rz74k9eea).\n\n\n## Code for Processing Fine-tuning Data\n\n[sec-process]: #code-for-processing-fine-tuning-data\n\nWe provide the sample script to process the datasets for edit-translation. Requires the raw data files at `raw_data/`.\n\n```\ncd python/\npython -m deltr.collector.DataProcessor edit_translation_data_process --exp cs2java --src_lang cs --tgt_lang java\n\n```\n\n## Code for Training and Evaluating Models\n\n[sec-traineval]: #code-for-training-and-evaluating-models\n\n### Train ML models\n\n```\ncd python/\npython -m deltr.coditT5.CodeT5 fit --exp_dir {MODELS_DIR}/${model_name}/${dataset} --data.dataset {dataset} --data.model ${model_name} --config  configs/coditT5.yaml\n\n# Example: python -m deltr.coditT5.CodeT5 fit --exp_dir models/edit-translation/java2cs --data.dataset java2cs --data.model edit-translation --config  configs/coditT5.yaml\n```\n\nResults are generated to `models/${model}/${dataset}/`, where:\n\n- `model/`: stores the trained model.\n\n- `logs/`: stores logs during training.\n\n### Run ML models to do inference\n\nRequires the dataset at `data/${model}/${dataset}/`, the trained model at `models/${model}/${dataset}/model/`.\n\n```\ncd python/\npython -m deltr.coditT5.CodeT5 predict --exp_dir {MODELS_DIR}/${model_name}/${dataset} --data.dataset {dataset} --data.model ${model_name} --config  configs/coditT5.yaml\n\n```\n\nResults are generated to `models/${model}/${dataset}/`, where:\n\n- `output.hyp`: the predictions.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengineeringsoftware%2Fcodeditor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fengineeringsoftware%2Fcodeditor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fengineeringsoftware%2Fcodeditor/lists"}