{"id":16286950,"url":"https://github.com/worldbeater/code-vecs","last_synced_at":"2025-10-23T20:33:53.293Z","repository":{"id":248122304,"uuid":"827821504","full_name":"worldbeater/code-vecs","owner":"worldbeater","description":"Code for the methods and algorithms described in the paper \"Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task\"","archived":false,"fork":false,"pushed_at":"2024-10-04T09:13:58.000Z","size":1293,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-28T22:43:50.669Z","etag":null,"topics":["ast","code-analysis","code-embedding","code2vec","embeddings","static-analysis","vector-embeddings"],"latest_commit_sha":null,"homepage":"https://www.mdpi.com/1999-5903/15/9/314","language":"Jupyter Notebook","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/worldbeater.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-12T12:56:55.000Z","updated_at":"2024-10-04T09:14:02.000Z","dependencies_parsed_at":"2024-07-25T19:45:44.486Z","dependency_job_id":"e1945bad-75ff-47c6-8892-1381839ab6af","html_url":"https://github.com/worldbeater/code-vecs","commit_stats":null,"previous_names":["worldbeater/code-vecs"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbeater%2Fcode-vecs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbeater%2Fcode-vecs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbeater%2Fcode-vecs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/worldbeater%2Fcode-vecs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/worldbeater","download_url":"https://codeload.github.com/worldbeater/code-vecs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244047544,"owners_count":20389203,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ast","code-analysis","code-embedding","code2vec","embeddings","static-analysis","vector-embeddings"],"created_at":"2024-10-10T19:44:06.769Z","updated_at":"2025-10-23T20:33:53.275Z","avatar_url":"https://github.com/worldbeater.png","language":"Jupyter Notebook","readme":"### Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task\n\nCode for source code embedding algorithms described in the paper [Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task](https://doi.org/10.3390/fi15090314). This repository also includes code implementing control flow graph-based source code embeddings for reproducing the experiments described in our paper [Source Code Embeddings Based on Control Flow Graphs and Markov Chains for Program Classification](https://ieeexplore.ieee.org/document/10803670).\n\n![image](https://github.com/user-attachments/assets/41dd19a5-bd7f-4377-8036-2368722de202)\n\n### Getting Started\n\n1. Install [Docker CE](https://docs.docker.com/engine/install/) and [GNU make](https://www.gnu.org/software/make/).\n2. Clone the repository, then clone the submodules using `git submodule update --init --recursive`\n3. Download the dataset [[2](https://doi.org/10.3390/data8060109)] from Zenodo and extract the `task-*.csv` files into `src/data`.\n4. Classification targets can contain digits, so navigate to `external/code2vec/common.py` and apply the patch:\n```diff\n     @staticmethod\n     def legal_method_names_checker(special_words, name):\n-        return name != special_words.OOV and re.match(r'^[a-zA-Z|]+$', name)\n+        return name != special_words.OOV\n```\n5. Run `make notebook` from repository root, run the notebooks.\n\n### References\n\n1. Gorchakov, A.V.; Demidova, L.A.; Sovietov, P.N. [Analysis of Program Representations Based on Abstract Syntax Trees and Higher-Order Markov Chains for Source Code Classification Task](https://doi.org/10.3390/fi15090314). Future Internet **2023**, 15, 314.\n2. Demidova, L.A.; Andrianova, E.G.; Sovietov, P.N.; Gorchakov, A.V. [Dataset of Program Source Codes Solving Unique Programming Exercises Generated by Digital Teaching Assistant](https://doi.org/10.3390/data8060109). Data **2023**, 8 (6), p. 109.\n3. Gorchakov, A.V.; Demidova, L.A.; Maslennikov, V.V. [Source Code Embeddings Based on Control Flow Graphs and Markov Chains for Program Classification](https://ieeexplore.ieee.org/document/10803670). Proceedings of the 2024 6th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA). IEEE, **2024**, pp 328-333.\n\n#### Citation\n\nIf you use the code from this repository in your research work, please consider citing [1](https://doi.org/10.3390/fi15090314) or [3](https://ieeexplore.ieee.org/document/10803670).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbeater%2Fcode-vecs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fworldbeater%2Fcode-vecs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fworldbeater%2Fcode-vecs/lists"}