{"id":15645345,"url":"https://github.com/shibing624/codeassist","last_synced_at":"2025-12-14T14:02:46.776Z","repository":{"id":62563270,"uuid":"457268953","full_name":"shibing624/CodeAssist","owner":"shibing624","description":"CodeAssist is an advanced code completion tool that provides high-quality code completions for Python, Java, C++ and so on. CodeAssist 是一个高级代码补全工具，高质量为 Python、Java 和 C++ 补全代码。","archived":false,"fork":false,"pushed_at":"2024-02-19T07:41:35.000Z","size":1112,"stargazers_count":58,"open_issues_count":2,"forks_count":8,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-21T06:57:57.565Z","etag":null,"topics":["auto-completion","code-autocomplete","code-generation","gpt-4","gpt2","starcoder","wizardcoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shibing624.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-09T08:20:08.000Z","updated_at":"2025-04-18T10:29:39.000Z","dependencies_parsed_at":"2024-10-03T12:10:03.704Z","dependency_job_id":"9e2c90d0-a6d3-459e-84be-a7cada4cf188","html_url":"https://github.com/shibing624/CodeAssist","commit_stats":null,"previous_names":["shibing624/codeassist","shibing624/autocoder","shibing624/code-autocomplete"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FCodeAssist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FCodeAssist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FCodeAssist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shibing624%2FCodeAssist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shibing624","download_url":"https://codeload.github.com/shibing624/CodeAssist/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251687350,"owners_count":21627566,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["auto-completion","code-autocomplete","code-generation","gpt-4","gpt2","starcoder","wizardcoder"],"created_at":"2024-10-03T12:06:40.730Z","updated_at":"2025-12-14T14:02:46.763Z","avatar_url":"https://github.com/shibing624.png","language":"Python","readme":"[**🇨🇳中文**](https://github.com/shibing624/codeassist/blob/main/README.md) | [**🌐English**](https://github.com/shibing624/codeassist/blob/main/README_EN.md) | [**📖文档/Docs**](https://github.com/shibing624/codeassist/wiki) | [**🤖模型/Models**](https://huggingface.co/shibing624) \n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/shibing624/codeassist\"\u003e\n    \u003cimg src=\"https://github.com/shibing624/codeassist/blob/main/docs/codeassist.png\" height=\"130\" alt=\"Logo\"\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n-----------------\n\n# CodeAssist: Advanced Code Completion Tool\n[![PyPI version](https://badge.fury.io/py/CodeAssist.svg)](https://badge.fury.io/py/CodeAssist)\n[![Contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg)](CONTRIBUTING.md)\n[![GitHub contributors](https://img.shields.io/github/contributors/shibing624/CodeAssist.svg)](https://github.com/shibing624/CodeAssist/graphs/contributors)\n[![License Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![python_vesion](https://img.shields.io/badge/Python-3.5%2B-green.svg)](requirements.txt)\n[![GitHub issues](https://img.shields.io/github/issues/shibing624/CodeAssist.svg)](https://github.com/shibing624/CodeAssist/issues)\n[![Wechat Group](https://img.shields.io/badge/wechat-group-green.svg?logo=wechat)](#Contact)\n\n## Introduction\n\n**CodeAssist** is an advanced code completion tool that intelligently provides high-quality code completions for Python, Java, and C++ and so on. \n\nCodeAssist 是一个高级代码补全工具，高质量为 Python、Java 和 C++ 等编程语言补全代码\n\n\n## Features\n\n- GPT based code completion\n- Code completion for `Python`, `Java`, `C++`, `javascript` and so on\n- Line and block code completion\n- Train(Fine-tuning) and predict model with your own data\n\n### Release Models\n\n| Arch   | BaseModel         | Model                                                                                                                   | Model Size | \n|:-------|:------------------|:------------------------------------------------------------------------------------------------------------------------|:----------:|\n| GPT   | gpt2              | [shibing624/code-autocomplete-gpt2-base](https://huggingface.co/shibing624/code-autocomplete-gpt2-base)                 |   487MB    |\n| GPT   | distilgpt2        | [shibing624/code-autocomplete-distilgpt2-python](https://huggingface.co/shibing624/code-autocomplete-distilgpt2-python) |   319MB    |\n| GPT   | bigcode/starcoder | [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)                                   |    29GB    |\n\n\n\n## Install\n\n```shell\npip install torch # conda install pytorch\npip install -U codeassist\n```\n\nor\n\n```shell\ngit clone https://github.com/shibing624/codeassist.git\ncd CodeAssist\npython setup.py install\n```\n\n## Usage\n\n### WizardCoder model\n\nWizardCoder-15b is fine-tuned `bigcode/starcoder` with alpaca code data, you can use the following code to generate code:\n\nexample: [examples/wizardcoder_demo.py](https://github.com/shibing624/CodeAssist/blob/main/examples/wizardcoder_demo.py)\n\n```python\nimport sys\n\nsys.path.append('..')\nfrom codeassist import WizardCoder\n\nm = WizardCoder(\"WizardLM/WizardCoder-15B-V1.0\")\nprint(m.generate('def load_csv_file(file_path):')[0])\n```\n\noutput:\n\n\n```python\nimport csv\n\ndef load_csv_file(file_path):\n    \"\"\"\n    Load data from a CSV file and return a list of dictionaries.\n    \"\"\"\n    # Open the file in read mode\n    with open(file_path, 'r') as file:\n        # Create a CSV reader object\n        csv_reader = csv.DictReader(file)\n        # Initialize an empty list to store the data\n        data = []\n        # Iterate over each row of data\n        for row in csv_reader:\n            # Append the row of data to the list\n            data.append(row)\n    # Return the list of data\n    return data\n```\n\nmodel output is impressively effective, it currently supports English and Chinese input, you can enter instructions or code prefixes as required.\n\n### distilgpt2 model\n\n\ndistilgpt2 fine-tuned code autocomplete model, you can use the following code:\n\nexample: [examples/distilgpt2_demo.py](https://github.com/shibing624/CodeAssist/blob/main/examples/distilgpt2_demo.py)\n\n```python\nimport sys\n\nsys.path.append('..')\nfrom codeassist import GPT2Coder\n\nm = GPT2Coder(\"shibing624/code-autocomplete-distilgpt2-python\")\nprint(m.generate('import torch.nn as')[0])\n```\n\noutput:\n\n```shell\nimport torch.nn as nn\nimport torch.nn.functional as F\n```\n\n### Use with huggingface/transformers：\n\nexample: [examples/use_transformers_gpt2.py](https://github.com/shibing624/CodeAssist/blob/main/examples/use_transformers_gpt2.py)\n\n### Train Model\n#### Train WizardCoder model\nexample: [examples/training_wizardcoder_mydata.py](https://github.com/shibing624/CodeAssist/blob/main/examples/training_wizardcoder_mydata.py)\n\n```shell\ncd examples\nCUDA_VISIBLE_DEVICES=0,1 python training_wizardcoder_mydata.py --do_train --do_predict --num_epochs 1 --output_dir outputs-wizard --model_name WizardLM/WizardCoder-15B-V1.0\n```\n\n- GPU memory: 31GB\n- finetune need 2*V100(32GB)\n- inference need 1*V100(32GB)\n\n#### Train distilgpt2 model\nexample: [examples/training_gpt2_mydata.py](https://github.com/shibing624/CodeAssist/blob/main/examples/training_gpt2_mydata.py)\n\n```shell\ncd examples\npython training_gpt2_mydata.py --do_train --do_predict --num_epochs 15 --output_dir outputs-gpt2 --model_name gpt2\n```\n\nPS: fine-tuned result model is GPT2-python: [shibing624/code-autocomplete-gpt2-base](https://huggingface.co/shibing624/code-autocomplete-gpt2-base), \nI spent about 24 hours with V100 to fine-tune it. \n\n\n### Server\n\nstart FastAPI server:\n\nexample: [examples/server.py](https://github.com/shibing624/CodeAssist/blob/main/examples/server.py)\n\n```shell\ncd examples\npython server.py\n```\n\nopen url: http://0.0.0.0:8001/docs\n\n![api](https://github.com/shibing624/CodeAssist/blob/main/docs/api.png)\n\n\n\n## Dataset\n\nThis allows to customize dataset building. Below is an example of the building process.\n\nLet's use Python codes from [Awesome-pytorch-list](https://github.com/bharathgs/Awesome-pytorch-list)\n\n1. We want the model to help auto-complete codes at a general level. The codes of The Algorithms suits the need.\n2. This code from this project is well written (high-quality codes).\n\ndataset tree:\n\n```shell\nexamples/download/python\n├── train.txt\n└── valid.txt\n└── test.txt\n```\n\nThere are three ways to build dataset:\n1. Use the huggingface/datasets library load the dataset\nhuggingface datasets [https://huggingface.co/datasets/shibing624/source_code](https://huggingface.co/datasets/shibing624/source_code)\n\n```python\nfrom datasets import load_dataset\ndataset = load_dataset(\"shibing624/source_code\", \"python\") # python or java or cpp\nprint(dataset)\nprint(dataset['test'][0:10])\n```\n\noutput:\n```shell\nDatasetDict({\n    train: Dataset({\n        features: ['text'],\n        num_rows: 5215412\n    })\n    validation: Dataset({\n        features: ['text'],\n        num_rows: 10000\n    })\n    test: Dataset({\n        features: ['text'],\n        num_rows: 10000\n    })\n})\n{'text': [\n\"            {'max_epochs': [1, 2]},\\n\", \n'            refit=False,\\n', '            cv=3,\\n', \n\"            scoring='roc_auc',\\n\", '        )\\n', \n'        search.fit(*data)\\n', \n'', \n'    def test_module_output_not_1d(self, net_cls, data):\\n', \n'        from skorch.toy import make_classifier\\n', \n'        module = make_classifier(\\n'\n]}\n```\n\n2. Download dataset from Cloud\n\n| Name | Source | Download | Size |\n| :------- | :--------- | :---------: | :---------: |\n| Python+Java+CPP source code | Awesome-pytorch-list(5.22 Million lines) | [github_source_code.zip](https://github.com/shibing624/codeassist/releases/download/0.0.4/source_code.zip) | 105M |\n\ndownload dataset and unzip it, put to `examples/`.\n\n3. Get source code from scratch and build dataset\n\n[prepare_code_data.py](https://github.com/shibing624/CodeAssist/blob/main/examples/prepare_code_data.py)\n\n```shell\ncd examples\npython prepare_code_data.py --num_repos 260\n```\n\n\n## Contact\n\n- Issue(建议)\n  ：[![GitHub issues](https://img.shields.io/github/issues/shibing624/CodeAssist.svg)](https://github.com/shibing624/CodeAssist/issues)\n- 邮件我：xuming: xuming624@qq.com\n- 微信我： 加我*微信号：xuming624, 备注：个人名称-公司-NLP* 进NLP交流群。\n\n\u003cimg src=\"docs/wechat.jpeg\" width=\"200\" /\u003e\n\n## Citation\n\n如果你在研究中使用了codeassist，请按如下格式引用：\n\nAPA:\n```latex\nXu, M. codeassist: Code AutoComplete with GPT model (Version 1.0.0) [Computer software]. https://github.com/shibing624/codeassist\n```\n\nBibTeX:\n```latex\n@software{Xu_codeassist,\nauthor = {Ming Xu},\ntitle = {CodeAssist: Code AutoComplete with Generation model},\nurl = {https://github.com/shibing624/codeassist},\nversion = {1.0.0}\n}\n```\n\n## License\nThis repository is licensed under the [The Apache License 2.0](LICENSE).\n\nPlease follow the [Attribution-NonCommercial 4.0 International](https://github.com/nlpxucan/WizardLM/blob/main/WizardCoder/MODEL_WEIGHTS_LICENSE) to use the WizardCoder model.\n\n\n## Contribute\n\n项目代码还很粗糙，如果大家对代码有所改进，欢迎提交回本项目，在提交之前，注意以下两点：\n\n- 在`tests`添加相应的单元测试\n- 使用`python setup.py test`来运行所有单元测试，确保所有单测都是通过的\n\n之后即可提交PR。\n\n## Reference\n\n- [gpt-2-simple](https://github.com/minimaxir/gpt-2-simple)\n- [galois-autocompleter](https://github.com/galois-autocompleter/galois-autocompleter)\n- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibing624%2Fcodeassist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshibing624%2Fcodeassist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshibing624%2Fcodeassist/lists"}