{"id":18429002,"url":"https://github.com/codefuse-ai/repofuse","last_synced_at":"2025-04-07T17:32:26.630Z","repository":{"id":252043022,"uuid":"837056772","full_name":"codefuse-ai/RepoFuse","owner":"codefuse-ai","description":null,"archived":false,"fork":false,"pushed_at":"2024-09-18T07:41:51.000Z","size":34908,"stargazers_count":29,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-09-18T12:04:01.523Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codefuse-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-02T05:58:40.000Z","updated_at":"2024-09-14T11:58:36.000Z","dependencies_parsed_at":"2024-10-25T16:40:24.748Z","dependency_job_id":null,"html_url":"https://github.com/codefuse-ai/RepoFuse","commit_stats":null,"previous_names":["codefuse-ai/repofuse"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FRepoFuse","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FRepoFuse/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FRepoFuse/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codefuse-ai%2FRepoFuse/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codefuse-ai","download_url":"https://codeload.github.com/codefuse-ai/RepoFuse/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223286332,"owners_count":17120001,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T05:15:20.780Z","updated_at":"2024-11-06T05:15:21.289Z","avatar_url":"https://github.com/codefuse-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RepoFuse: Repository-Level Code Completion with Language Models with Fused Dual Context\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/codefuse-ai/MFTCoder/blob/main/assets/github-codefuse-logo-update.jpg\" width=\"50%\" /\u003e\n\u003c/div\u003e\n\n## Overview\n\nRepoFuse is a pioneering solution designed to enhance repository-level code completion without the latency trade-off. RepoFuse uniquely fuses two types of context: the analogy context, rooted in code analogies, and the rationale context, which encompasses in-depth semantic relationships. We propose a novel rank truncated generation (RTG) technique that efficiently condenses these contexts into prompts with restricted size. This enables RepoFuse to deliver precise code completions while maintaining inference efficiency. Our evaluations using the CrossCodeEval suite reveal that RepoFuse outperforms common open-source methods, achieving an average improvement of 3.97 in code exact match score for the Python dataset and 3.01 for the Java dataset compared to state-of-the-art baseline methods.\n\n![img.jpg](./assets/workflow.png)\n\n**Figure: The workflow of RepoFuse**\n### Repo-specific semantic graph\n\nThe Repo-specific semantic graph is a tool that can construct the dependency relationships between entities in the code and store this information in the form of a multi-directed graph. We use this graph to construct the context for code completion.\n\nSee [repo_specific_semantic_graph/README.md](repo_specific_semantic_graph/README.md) for details.\n\n## Data Generation\n\n### Construct CrossCodeEval line completion data retrieved from Repo-specific semantic graph context\n\n1. Follow instructions on [repo_specific_semantic_graph/README.md#install](repo_specific_semantic_graph/README.md#install) to install the Repo-specific semantic graph Python package.\n2. Install the rest of the dependencies that the script depend on: `pip install -r retrieval/requirements.txt`\n3. Download the CrossCodeEval dataset and the raw data from \u003chttps://github.com/amazon-science/cceval\u003e\n4. Run `retrieval/construct_cceval_data.py` to construct the Repo-Specific Semantic Graph context data. You can\n   run `python retrieval/construct_cceval_data.py -h` for help on the arguments. For example:\n\n   ```shell\n   python retrieval/construct_cceval_data.py -d \u003cpath/to/CrossCodeEval\u003e/crosscodeeval_data/python/line_completion_oracle_bm25.jsonl -o \u003cpath/to/output_dir\u003e/line_completion_dependency_graph.jsonl -r \u003cpath/to/CrossCodeEval\u003e/crosscodeeval_rawdata -j 10 -l python\n   ```\n\n## Evaluation\nThe following figure demonstrates the overall performance of RepoFuse:\n![img.jpg](./assets/evaluation.png)\n\nWe conducted experiments on DeepSeek and StarCoder models with varying parameter sizes, comparing the performance of using only Similar context, only Semantic context, and using Optimal Dual Context (ODC) under different Token Window sizes. The results show that ODC achieves the best performance across different models and Token Window sizes. To reproduce the results of this experiment, please follow these steps:\n\n1. run `cd eval \u0026\u0026 pip install -r requirements.txt` to install evaluation environment.\n2. You need to modify the configuration in `eval.sh`, specifically including the following:\n\n+ model_name_or_path:Replace {YOUR_MODEL_PATH} with the path to your model.\n\n+ prompt_file:Replace {YOUR_PROMPT_FILE} with the path to your prompt file.\n\n+ cfc_seq_length_list:Adjust the list of lengths for the crossfile content prompt as needed. You can pass in multiple\n  values at once, separated by commas.\n\n+ crossfile_type:The type of crossfile content you use. You can choose from Similar, Related and S_R. You can pass in\n  multiple values at once, separated by commas.\n\n+ ranking_strategy_list:Specify the ranking strategies to use. You can choose from UnixCoder, Random, CodeBert, Jaccard,\n  Edit, BM25, InDegree, and Es_Orcal.\n\n+ lang:Set the test language. Supported languages are python, java, csharp, and typescript.\n\n3. Run `bash eval.sh`\n\n## Contributing\n\nContributions are welcome! If you have any suggestions, ideas, bug reports, or new model/feature supported, please open\nan issue or submit a pull request.\n\n## Citation\n\nIf you find our work useful or helpful for your R\u0026D works, please feel free to cite our paper as below.\n\n```\n@misc{liang2024repofuserepositorylevelcodecompletion,\n      title={RepoFuse: Repository-Level Code Completion with Fused Dual Context}, \n      author={Ming Liang and Xiaoheng Xie and Gehao Zhang and Xunjin Zheng and Peng Di and wei jiang and Hongwei Chen and Chengpeng Wang and Gang Fan},\n      year={2024},\n      eprint={2402.14323},\n      archivePrefix={arXiv},\n      primaryClass={cs.SE},\n      url={https://arxiv.org/abs/2402.14323}, \n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2Frepofuse","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodefuse-ai%2Frepofuse","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodefuse-ai%2Frepofuse/lists"}