{"id":13623866,"url":"https://github.com/tthtlc/awesome-source-analysis","last_synced_at":"2026-04-11T08:44:44.338Z","repository":{"id":73284696,"uuid":"137897248","full_name":"tthtlc/awesome-source-analysis","owner":"tthtlc","description":"Source code understanding via Machine Learning techniques","archived":false,"fork":false,"pushed_at":"2022-11-29T06:51:01.000Z","size":77,"stargazers_count":136,"open_issues_count":1,"forks_count":25,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-07T00:16:29.019Z","etag":null,"topics":["automated-programming","deep-learning","machine-learning","source-code-analysis"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tthtlc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2018-06-19T13:44:19.000Z","updated_at":"2025-03-27T07:05:04.000Z","dependencies_parsed_at":"2023-03-22T15:03:10.544Z","dependency_job_id":null,"html_url":"https://github.com/tthtlc/awesome-source-analysis","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tthtlc%2Fawesome-source-analysis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tthtlc%2Fawesome-source-analysis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tthtlc%2Fawesome-source-analysis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tthtlc%2Fawesome-source-analysis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tthtlc","download_url":"https://codeload.github.com/tthtlc/awesome-source-analysis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248724585,"owners_count":21151560,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automated-programming","deep-learning","machine-learning","source-code-analysis"],"created_at":"2024-08-01T21:01:36.491Z","updated_at":"2026-04-11T08:44:44.331Z","avatar_url":"https://github.com/tthtlc.png","language":null,"funding_links":[],"categories":["Others"],"sub_categories":[],"readme":"# Awesome Source Code Analysis Via Machine Learning Techniques\n\nA list of resources for source code analysis application using Machine Learning techniques (eg, Deep Learning, PCA, SVM, Bayesian, proabilistic models, reinformcement learning techniques etc)\n\nMaintainers - [Peter Teoh](https://github.com/tthtlc)\n\n## Contributing\nPlease feel free to [pull requests](https://github.com/tthtlc/awesome-source-analysis/pulls), email Peter Teoh (htmldeveloper@gmail.com) or join our chats to add links.\n\n[[Join the chat at https://gitter.im/tthtlc/awesome-source-analysis](https://gitter.im/tthtlc/awesome-source-analysis)]\n\n## Sharing\n## Table of Contents\n\nMachine-Learning-Guided Selectively Unsound Static Analysis\nhttp://www.seas.upenn.edu/~kheo/home/paper/icse17-heohyi.pdf\n\nA Survey of Machine Learning for Big Code and Naturalness \nhttps://arxiv.org/pdf/1709.06182\n\nAriadne: Analysis for Machine Learning Programs \nhttps://arxiv.org/pdf/1805.04058\n\nThe use of machine learning with signal- and NLP processing of source code to fingerprint, detect, and classify vulnerabilities and weaknesses with MARFCAT\nhttps://arxiv.org/abs/1010.2511\n\nVulDeePecker: A Deep Learning-Based System for Vulnerability Detection\nhttps://arxiv.org/pdf/1801.01681\n\ncode2vec: Learning Distributed Representations of Code \nhttps://arxiv.org/pdf/1803.09473\n\nAutomated software vulnerability detection with machine learning\nhttps://arxiv.org/abs/1803.04497\n\nAutomatic feature learning for vulnerability prediction\nhttps://arxiv.org/pdf/1708.02368\n\nNeural Turing Machines\nhttps://arxiv.org/pdf/1410.5401.pdf\n\nDeepCoder: Learning to Write Programs\nhttps://arxiv.org/abs/1611.01989\n\nRecent Advances in Neural Program Synthesis\nhttps://arxiv.org/pdf/1802.02353\n\nNeural-Guided Deductive Search for Real-Time Program Synthesis\nhttps://arxiv.org/pdf/1804.01186\n\nRobustFill: Neural Program Learning under Noisy I/O\nhttps://arxiv.org/pdf/1703.07469\n\nOn End-to-End Program Generation from User Intention by Deep\nhttps://arxiv.org/pdf/1510.07211\n\nNeural Program Search: Solving Programming Tasks from Description\nhttps://arxiv.org/pdf/1802.04335\n\nA Syntactic Neural Model for General-Purpose Code Generation\nhttps://arxiv.org/pdf/1704.01696\n\nBuilding Machines That Learn and Think Like People\nhttps://arxiv.org/pdf/1604.00289\n\nDifferentiable Programs with Neural Libraries\nhttps://arxiv.org/pdf/1611.02109\n\nSummary-TerpreT: A Probabilistic Programming Language for Program Induction\nhttps://arxiv.org/pdf/1612.00817\n\nAuto-Documenation for Software Development\nhttps://arxiv.org/pdf/1701.08485\n\nBOOK: Storing Algorithm-Invariant Episodes for Deep Reinforcement Learning\nhttps://arxiv.org/pdf/1709.01308\n\nBoda-RTC: Productive Generation of Portable, Efficient Code ...\nhttps://arxiv.org/pdf/1606.00094\n\nMaking Neural Programming Architectures Generalize via Recursion\nhttps://arxiv.org/pdf/1704.06611\n\nDifferentiable Functional Program Interpreters\nhttps://arxiv.org/pdf/1611.01988\n\nUtilizing Static Analysis and Code Generation to Accelerate\nhttps://arxiv.org/pdf/1206.6466\n\nDeep Probabilistic Programming Languages: A Qualitative Study\nhttps://arxiv.org/pdf/1804.06458\n\nBinPro: A Tool for Binary Source Code Provenance\nhttps://arxiv.org/pdf/1711.00830\n\nA Survey on Compiler Autotuning using Machine Learning\nhttps://arxiv.org/pdf/1801.04405\n\nEstimating defectiveness of source code: A predictive model using GitHub content\nhttps://arxiv.org/pdf/1803.07764\n\nEMBER: An Open Dataset for Training Static PE Malware Machine\nhttps://arxiv.org/pdf/1804.04637\n\nOn End-to-End Program Generation from User Intention by Deep Neural Networks\nhttps://arxiv.org/pdf/1510.07211\n\nUtilizing Static Analysis and Code Generation to Accelerate Neural Networks\nhttps://arxiv.org/abs/1206.6466\n\nDLPaper2Code: Auto-generation of Code from Deep Learning Research Paper\nhttps://arxiv.org/pdf/1711.03543\n\nInferring Generative Model Structure with Static Analysis\nhttps://arxiv.org/pdf/1709.02477\n\nSorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities\nhttps://arxiv.org/pdf/1707.04742\n\nDeepAPT: Nation-State APT Attribution Using End-to-End Deep Neural Networks\nhttps://arxiv.org/pdf/1711.09666\n\nAutomatic Structure Discovery for Large Source Code\nhttps://arxiv.org/pdf/1202.3335\n\nComment Generation for Source Code: Survey\nhttps://arxiv.org/pdf/1802.02971\n\nTowards Reverse-Engineering Black-Box Neural Networks\nhttps://arxiv.org/abs/1711.01768\n\nDatabase Reverse Engineering based on Association Rule Mining \nhttps://arxiv.org/pdf/1004.3272.pdf\n\nAutomated detection and classification of cryptographic algorithms in binary programs through machine learning\nhttps://arxiv.org/pdf/1503.01186\n\nAutomatically Generating Commit Messages from Diffs using Neural Machine Translation\nhttps://arxiv.org/pdf/1708.09492\n\nWhen Coding Style Survives Compilation: De-anonymizing Programmers from Executable\nhttps://arxiv.org/pdf/1512.08546\n\nCode smells\nhttps://arxiv.org/pdf/1802.06063\n\nData Driven Exploratory Attacks on Black Box Classifiers in Adversarial Domains\nhttps://arxiv.org/pdf/1703.07909\n\npix2code: Generating Code from a Graphical User Interface Screenshot\nhttps://arxiv.org/pdf/1705.07962\n\nDeep Learning in Software Engineering\nhttps://arxiv.org/pdf/1805.04825\n\nPredicting Software Defects Through SVM: An Empirical Approach\nhttps://arxiv.org/pdf/1803.03220\n\nA Survey of Reverse Engineering and Program Comprehension\nhttps://arxiv.org/pdf/cs/0503068\n\nhttps://owasp.org/www-project-top-ten/2017/\n\nhttps://arxiv.org/pdf/1709.07101.pdf\n\nhttps://arxiv.org/pdf/1805.05206.pdf   \n\nhttps://arxiv.org/pdf/1807.09160.pdf \n\nhttps://arxiv.org/pdf/1806.07336.pdf\n\nOr just search arxiv.org (inaccuracies in identifying papers expected): [recent arxiv.org search](/summary_6dec2018.md)\n\n[LLVM based vulnerabilities search](/summary_llvm_source6dec2018.md)\n\nAs an extension\n\nhttps://ml4code.github.io/ \n\n(this site being an offshoot of the paper: https://arxiv.org/abs/1709.06182)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftthtlc%2Fawesome-source-analysis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftthtlc%2Fawesome-source-analysis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftthtlc%2Fawesome-source-analysis/lists"}