{"id":18876888,"url":"https://github.com/autosoft-dev/code-bert","last_synced_at":"2026-03-01T13:02:54.345Z","repository":{"id":50322512,"uuid":"241995751","full_name":"autosoft-dev/code-bert","owner":"autosoft-dev","description":"Automatically check mismatch between code and comments using AI and ML","archived":false,"fork":false,"pushed_at":"2021-06-28T04:11:26.000Z","size":276,"stargazers_count":53,"open_issues_count":1,"forks_count":4,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-14T18:48:22.439Z","etag":null,"topics":["bert-model","deep-learning","function-docstring-pairs","machine-learning","machine-learning-on-source-code","mlmodel","python","python3","roberta","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/autosoft-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-20T21:39:24.000Z","updated_at":"2024-12-26T13:23:57.000Z","dependencies_parsed_at":"2022-08-04T10:30:39.282Z","dependency_job_id":null,"html_url":"https://github.com/autosoft-dev/code-bert","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/autosoft-dev/code-bert","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Fcode-bert","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Fcode-bert/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Fcode-bert/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Fcode-bert/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/autosoft-dev","download_url":"https://codeload.github.com/autosoft-dev/code-bert/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/autosoft-dev%2Fcode-bert/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29969700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T12:56:10.327Z","status":"ssl_error","status_checked_at":"2026-03-01T12:55:24.744Z","response_time":124,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert-model","deep-learning","function-docstring-pairs","machine-learning","machine-learning-on-source-code","mlmodel","python","python3","roberta","transformer"],"created_at":"2024-11-08T06:15:56.788Z","updated_at":"2026-03-01T13:02:54.311Z","avatar_url":"https://github.com/autosoft-dev.png","language":"Python","readme":"# This repo is not maintained and the pre-trained model is not available anymore. Sorry for that. Codist is not going to maintain those repo from this point onward.\n\n[![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/downloads/release/python-360/)\n![](build_badges/macpass.svg)\n![](build_badges/linuxpass.svg)\n![](build_badges/windowsfail.svg)\n\n[![Twitter Follow](https://img.shields.io/twitter/follow/AiCodist.svg?style=social)](https://twitter.com/AiCodist)  \n\n# code-bert\n\ncodeBERT is a package to **automatically check if your code documentation is up-to-date**. codeBERT currently works for Python code. \n\n*If you are using the source distribution the present version is available for Linux and Mac only. We are working on the Windows release. Please hang on*\n\n\nThis is [CodistAI](https://codist-ai.com/) open source version to easily use the fine tuned model based on our open source MLM code model [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2)\n\n[codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) is a RoBERTa model, trained using Hugging Face Transformer library and then we have fine tuned the model on the task of predicting the following - \n\n\n## 🏆 Output\n\nGiven a function `f` and a doc string `d` a code-bert predicts whether `f` and `d` are matching or not (meaning, whether they represent the same concept or not)\n\nA report lists out all the functions where docsting does not matchn as follow:\n\n```\n ======== Analysing test_files/inner_dir/test_code_get.py =========\n\n\u003e\u003e\u003e Function \"get_file\" with Dcostring \"\"\"opens a url\"\"\"\n\u003e\u003e\u003e Do they match?\nNo\n\n```\n\n\n## Local setup \n\n**The entire code base is built and availble for Python3.6+**\n\nWe have provided very easy to use CLI commands to achieve all these, and at scale. Let's go through that step by step\n\n**We strongly recommend using a virtual environment for the followinsg steps** \n\n1. First clone this repo - `git clone https://github.com/autosoft-dev/code-bert.git \u0026\u0026 cd code-bert`\n\n2. (Assuming you have the virtualenv activated) Then do `pip install -r requirements.txt`\n\n3. Then install the package with `pip install -e .`\n\n4. First step is to download and set up the model. If the above steps are done properly then there is command for doing this `download_model`\n\n5. The model is almost 1.7G in total, so it may take a bit of time before it finishes.\n\n6. Once this is done, you are ready to analyze code. For that we have a CLI option also. Details of that in the following section\n\n-----------\n\nYou can run the following command to analyze one file or a directory containing a bunch of files\n\n```\nusage: run_pipeline [-h] [-f FILE_NAME] [-r RECURSIVE] [-m]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -f FILE_NAME, --file_name FILE_NAME\n                        The name of the file you want to run the pipeline on\n  -r RECURSIVE, --recursive RECURSIVE\n                        Put the directory if you want to run recursively\n  -m, --show_match      Shall we show the matches? (Default false)\n```\n\n## Docker setup\n\nIt has been request by our users and here it is! You will not need to go through any painful setup process at all. We have Dockerized the entire thing for you. Here are the steps to use it. \n\n- Pull the image `docker pull codistai/codebert`\n\n- Assuming that you have a bunch of files to be analyzed under `test_files` in your present working directory, run this command - `docker run -v \"$(pwd)\"/test_files:/usr/src/app/test_files -it codistai/codebert run_pipeline -r test_files`\n\n- If you wish to analyze any other directory, simply change the mounting option in the `docker run` command (the path after `-v` the format should be `full/local/path:/usr/src/app/\u003cmount_dir_name\u003e`) and also mention the same `\u003cmount_dir_name\u003e` after the `run_pipeline` command.\n\n\n\n## 🎮 code-bert example\n\nSLet's say you have a directory called `test_files` with some python files in it. Here is how to run the analysis: \n\n`run_pipeline  -r test_files`\n\nThe algorithm will take one file at a time to analyze recursively on the whole directory and prompt out a report of not matching function-docstring pairs.\n\n```\n ======== Analysing test_files/test_code_add.py =========\n\n\n ======== Analysing test_files/inner_dir/test_code_get.py =========\n\u003e\u003e\u003e Function \"get_file\" with Dcostring \"\"\"opens a url\"\"\"\n\u003e\u003e\u003e Do they match?\nNo\n******************************************************************\n```\n\n\nYou can optionally pass the `--show_match` flag like so `run_pipeline -r test_files --show_match` to prompt out both match and mismatching function-docstring pairs.\n\n```\n ======== Analysing test_files/test_code_add.py =========\n\n\n\u003e\u003e\u003e Function \"add\" with Dcostring \"\"\"sums two numbers and returns the result\"\"\"\n\u003e\u003e\u003e Do they match?\nYes\n******************************************************************\n\u003e\u003e\u003e Function \"return_all_even\" with Dcostring \"\"\"numbers that are not really odd\"\"\"\n\u003e\u003e\u003e Do they match?\nYes\n******************************************************************\n\n ======== Analysing test_files/inner_dir/test_code_get.py =========\n\n\n\u003e\u003e\u003e Function \"get_file\" with Dcostring \"\"\"opens a url\"\"\"\n\u003e\u003e\u003e Do they match?\nNo\n******************************************************************\n```\n\n\n\n## 💡 code-bert logic\n\nLet's consider the following code\n\n```python\nfrom pathlib import Path\n\ndef get_file(filename):\n    \"\"\"\n    opens a url\n    \"\"\"\n    if not Path(filename).is_file():\n        return None\n    return open(filename, \"rb\")\n\n```\n1. Mine souce code to get function-docstring pairs using [tree-hugger](https://github.com/autosoft-dev/tree-hugger)\n2. Prep functions and docstring data to fit input format expected by [codeBERT-small-v2](https://huggingface.co/codistai/codeBERT-small-v2) model.\n- **Function** - `def get file ( filename ) : indent if not path ( filename ) . is file ( ) : indent return none dedent return open ( filename , \"rb\" ) dedent`\n\n- **Doc String** - `opens a url`\n\n3. Run the model \n```python\nmatch, confidence = model(function, docstring)\n```\n\n\nStay tuned! \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautosoft-dev%2Fcode-bert","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fautosoft-dev%2Fcode-bert","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fautosoft-dev%2Fcode-bert/lists"}