Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/SystemSecurityStorm/Awesome-Binary-Similarity

An awesome & curated list of binary code similarity papers
https://github.com/SystemSecurityStorm/Awesome-Binary-Similarity

List: Awesome-Binary-Similarity

Last synced: 7 days ago
JSON representation

An awesome & curated list of binary code similarity papers

Lists

README

        

# Awesome Binary Similarity

| Title | Venue | Year | Paper | Slide | Video | Github |
| :----------------------------------------------------------: | :----------: | :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching | ICSE | 2024 | [link](https://dl.acm.org/doi/10.1145/3597503.3639100) | | | |
| Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection | Usenix | 2024 | [link](https://www.usenix.org/system/files/sec24summer-prepub-346-he.pdf) | | | [link](https://github.com/NSSL-SJTU/HermesSim)
| CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision | ISSTA | 2024 | [link](https://arxiv.org/pdf/2402.16928.pdf) | | | [link](https://github.com/Hustcw/CLAP)|
| CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection | ISSTA | 2024 | [link](https://arxiv.org/pdf/2402.18818.pdf) | | | [link](https://github.com/Hustcw/CEBin)|
| FASER: Binary Code Similarity Search through the use of Intermediate Representations | CAMLIS | 2023 | [link](https://arxiv.org/pdf/2310.03605.pdf) | | [link](https://www.youtube.com/watch?v=d5SGeQbvG4o)| [link](https://github.com/br0kej/FASER)|
|VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity | | 2023 | [link](https://arxiv.org/abs/2312.00507) | | | |
| kTrans: Knowledge-Aware Transformer for Binary Code Embedding | | 2023 | [link](https://arxiv.org/abs/2308.12659)| | | [link](https://github.com/Learner0x5a/kTrans-release)|
| Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis | ISSTA | 2023 |[link](https://dl.acm.org/doi/pdf/10.1145/3597926.3598121) | | | [link](https://zenodo.org/record/7978808)
| Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge |TOSEM | 2023 | [link](https://dl.acm.org/doi/10.1145/3604611) | | | [link](https://github.com/Asteria-BCSD/Asteria-Pro)
| sem2vec: Semantics-aware Assembly Tracelet Embedding | TOSEM | 2023 | [link](https://dl.acm.org/doi/10.1145/3569933) | | | [link](https://github.com/sem2vec) |
| 1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis | TOSEM | 2023 | [link](https://dl.acm.org/doi/10.1145/3561385) | | | |
| Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures | AsiaCCS | 2023 | [Link](https://dl.acm.org/doi/10.1145/3579856.3582818) | | | |
|VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search | NDSS | 2023 |[link](https://www.ndss-symposium.org/wp-content/uploads/2023/02/ndss2023_f415_paper.pdf) | | | [link](https://github.com/RazorMegrez/VulHawk)|
| A Game-Based Framework to Compare Program Classifiers and Evaders | CGO | 2023 | [link](https://doi.org/10.1145/3579990.3580012) | [link](https://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO23_ThaisDamasio.pdf) | [link](https://youtu.be/-fgG6agTWtI?feature=shared) | [link](https://github.com/lac-dcc/yali) |
| BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features | MDPI | 2023 | [link](https://www.mdpi.com/2076-3417/13/1/413) | | | |
| A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features | CSUR | 2022 | [link](https://dl.acm.org/doi/10.1145/3486860) | | | |
|Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning | ACSAC | 2022 | [link](https://dl.acm.org/doi/abs/10.1145/3564625.3567975)| [link](https://www.acsac.org/2022/program/papers/76-Ahn-Software_Security_I.pdf)| | [link](https://github.com/asw0316/binshot)|
|Improving cross-platform binary analysis using representation learning via graph alignment | ISSTA | 2022 | [link](https://dl.acm.org/doi/pdf/10.1145/3533767.3534383)| | [link](https://www.youtube.com/watch?v=rK1CDMauaZU&t=89s) | [link](https://github.com/yonsei-cysec/XBA)|
|jTrans: Jump-Aware Transformer for Binary Code Similarity | ISSTA | 2022 | [link](https://arxiv.org/pdf/2205.12713.pdf)| | [link](https://www.youtube.com/watch?v=rAirmnUsC1k) | [link](https://github.com/vul337/jTrans/)|
|COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks | DIMVA | 2022 | [link](https://dl.acm.org/doi/abs/10.1007/978-3-031-09484-2_4)| | | |
|A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware | ISSTA | 2022 | [link](https://doi.org/10.1145/3533767.3534366)| | [link](https://www.youtube.com/watch?v=H2o45YRguMM) | [link](https://github.com/BBge/FirmSecDataset)|
|How Machine Learning Is Solving the Binary Function Similarity Problem | Usenix | 2022 | [link](https://www.s3.eurecom.fr/docs/usenixsec22_marcelli.pdf)| |[link](https://www.youtube.com/watch?v=e9bab7GpwnI) | [link](https://github.com/Cisco-Talos/binary_function_similarity)|
|Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking | TSE | 2022 | [link](https://ieeexplore.ieee.org/document/9707874)| | | [link](https://github.com/computer-analysis/BinUSE)|
| Program Representations for Predictive Compilation: State of Affairs in the Early 20's | COLA | 2022 | [link](https://doi.org/10.1016/j.cola.2022.101171) | [link](https://homepages.dcc.ufmg.br/~fernando/publications/papers/FaustinoJCL22.pdf) | | [link](https://github.com/otavioon/COLA-2022-Tools) |
| Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study | JCVHT | 2022 | [link](https://link.springer.com/article/10.1007/s11416-022-00452-z) | | | |
| PalmTree: Learning an Assembly Language Model for Instruction Embedding | CCS | 2021 | [link](https://dl.acm.org/doi/abs/10.1145/3460120.3484587) | [link](https://www.inforsec.org/wp/wp-content/uploads/2021/07/qy.pdf) | | [link](https://github.com/palmtreemodel/PalmTree) |
| Binary code similarity detection | ASE | 2021 | [link](https://dl.acm.org/doi/abs/10.1109/ASE51524.2021.9678518)| | | |
| Binary diffing as a network alignment problem via belief propagation | ASE | 2021 | [link](https://basepub.dauphine.psl.eu/bitstream/handle/123456789/22755/menginrossi2021binary-diffing.pdf?sequence=2)| | | |
| Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection | IEEE DSN 2021 | 2021 | [link](https://arxiv.org/pdf/2108.06082v1.pdf)| | | [link](https://github.com/Asteria-BCSD/Asteria)|
| BinDeep: A deep learning approach to binary code similarity detection | ESWA | 2021 | [link](https://www.sciencedirect.com/science/article/pii/S0957417420310332)| | | |
|EnBinDiff: Identifying Data-Only Patches for Binaries | TDSC | 2021 | [link](https://ieeexplore.ieee.org/document/9645381)| | | |
|BinDiffNN: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences | TSE | 2021 | [link](https://ieeexplore.ieee.org/document/9470904)| | | [link](https://github.com/sami2316/bindiff_NN)|
| Codee: A Tensor Embedding Scheme for Binary Code Search | TSE | 2021 |[link](https://ieeexplore.ieee.org/document/9345532) | | | [link](https://github.com/ycachy/Codee)|
| Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned | TSE(revision) | 2021 | [link](https://arxiv.org/pdf/2011.10749.pdf) | | | [link](https://github.com/SoftSec-KAIST/TikNib) |
| How could Neural Networks understand Programs? | ICML 2021 | 2021 | [link](https://arxiv.org/pdf/2105.04297.pdf) | | [link](https://github.com/pdlan/OSCAR) ||
| Multi-threshold token-based code clone detection | SANER 2021 | 2021 | [link](https://arxiv.org/pdf/2002.05204.pdf) | | ||
| FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings | IEEE Euro S&P 2021 | 2021 | [link](https://arxiv.org/pdf/2006.14147.pdf) | | [link](https://www.youtube.com/watch?v=WskRnEY7oCs) | [link](https://github.com/vernamlab/FastSpec) |
| TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity | | 2020 | [link](https://arxiv.org/pdf/2012.08680.pdf) | | | [link](https://github.com/CUMLSec/trex) |
| Similarity of Binaries Across Optimization Levels and Obfuscation | ESORICS 2020 | 2020 | [link](https://books.google.com.hk/books?id=sqT8DwAAQBAJ&pg=PA295&lpg=PA295&dq=Similarity+of+Binaries+Across+Optimization+Levels+and+Obfuscation&source=bl&ots=OFw-NpBFEJ&sig=ACfU3U2DFjxq5lFEM2smLXvWRNf8dyX-TQ&hl=en&sa=X&ved=2ahUKEwiZvuKSk93yAhXCB94KHYNeA_YQ6AF6BAgPEAM#v=onepage&q=Similarity%20of%20Binaries%20Across%20Optimization%20Levels%20and%20Obfuscation&f=false) | | [link](https://www.youtube.com/watch?v=Pi7wsCvfBa8) | |
| Open-source tools and benchmarks for code-clone detection: past, present, and future trends | | 2020 | [link](https://dl.acm.org/doi/abs/10.1145/3381307.3381310) | | | |
| Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence | | 2020 | | | | |
| LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code | | 2020 | [link](https://ieeexplore.ieee.org/document/9054845) | | | |
| Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree | SANER | 2020 | [link](https://arxiv.org/pdf/2002.08653.pdf) | | | |
| What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning | | 2020 | [link](https://arxiv.org/pdf/2002.02650.pdf) | | | |
| Clone Detection on Large Scala Codebases | | 2020 | [link](https://ieeexplore.ieee.org/document/9047640) | | | |
| CloneCompass: Visualizations for Code Clone Analysis | | 2020 | [link](https://dspace.library.uvic.ca/bitstream/handle/1828/11729/Ying_Wang_MSc_2020.pdf?sequence=1&isAllowed=y) | | | |
| DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing | NDSS | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/2020/02/24311.pdf) | | [link](https://www.youtube.com/watch?v=TB50csOprMs) | [link](https://github.com/yueduan/DeepBinDiff) |
| VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets | EuroS&P | 2020 | [link](https://www2.seas.gwu.edu/~howie/publications/VGraph-EuroSP20.pdf) | | | |
| Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection | AAAI | 2020 | [link](https://keenlab.tencent.com/en/whitepapers/Ordermatters.pdf) | | | |
| Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture | NDSS | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/bar2020-23002.pdf) | | | [link](https://github.com/zhangxiaochuan/MIRROR) |
| Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis | NDSS Workshop on Binary Analysis Research (BAR) | 2019 | [link](https://www.ndss-symposium.org/wp-content/uploads/bar2019_20_Massarelli_paper.pdf) | | | [link](https://github.com/lucamassarelli/Unsupervised-Features-Learning-For-Binary-Similarity) |
| Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization | IEEE S&P | 2019 | [link](https://www.computer.org/csdl/proceedings-article/sp/2019/666000a038/19skfc3ZfKo) | [link](https://pdfs.semanticscholar.org/38ae/cd9be307867e375b17597499e3e8be2d4930.pdf) | [link](https://www.youtube.com/watch?v=6ethsho5uJA&feature=emb_title) | |
| Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things | MDPI | 2019 | [link](https://www.mdpi.com/2076-3417/9/16/3283/pdf) | | | |
| A Survey of Binary Code Similarity | CSUR | 2019 | [link](https://arxiv.org/pdf/1909.11424.pdf) | | | |
| 代码克隆检测研究进展 | 软件学报 | 2019 | [link](https://xin-xia.github.io/publication/rjxb181.pdf) | | | |
| A Systematic Review on Code Clone Detection | | 2019 | [link](https://ieeexplore.ieee.org/document/8719895) | | | |
| A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis | NDSS | 2019 | [link](https://arxiv.org/pdf/1812.09652.pdf) | | | [link](https://github.com/nlp-code-analysis/cross-arch-instr-model) |
| Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs | NDSS | 2019 | [link](https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_11-4_Zuo_paper.pdf) | [link](https://www.ndss-symposium.org/wp-content/uploads/ndss2019_11-4_Zuo_slides.pdf) | [link](https://www.youtube.com/watch?v=-BeqwMPQNrw&list=PLfUWWM-POgQvnPOa9Bo1AyKplMkOGfHUT&index=5&t=1s) | [model](https://nmt4binaries.github.io/) |
| SAFE: Self-Attentive Function Embeddings for Binary Similarity | | 2019 | [link](https://arxiv.org/pdf/1811.05296.pdf) | [link](https://www.dimva2019.org/wp-content/uploads/sites/31/2019/06/DIMVA19-Slides-22.pdf) | | [link](https://github.com/gadiluna/SAFE) |
| Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection | SANER | 2019 | [link](https://ieeexplore.ieee.org/document/8668039) | | | |
| 基于深度学习的跨平台二进制代码关联分析 | | 2019 | [link](https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD202001&filename=1019646524.nh) | | | |
| CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph | | 2019 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) | | | |
| Function matching between binary executables: efficient algorithms and features | JCVHT | 2019 | [link](https://users.auth.gr/kehagiat/Papers/journal/2019JCVHuku.pdf) | | | |
| BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis | ICSME | 2018 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/icsme18.pdf) | | | |
| αDiff: Cross-Version Binary Code Similarity Detection with DNN | ASE | 2018 | [link](https://dl.acm.org/doi/pdf/10.1145/3238147.3238199?download=true) | | | [dataset](https://github.com/twelveand0/alphadiff-dataset) |
| Binary Similarity Detection Using Machine Learning | PLDI | 2018 | [link](https://dl.acm.org/doi/10.1145/3264820.3264821) | | | |
| CCAligner: A Token Based Large-Gap Clone Detector | ICSE | 2018 | [link](http://home.ustc.edu.cn/~wpc520/papers/CCAligner.pdf) | | | |
| Oreo: Detection of Clones in the Twilight Zone | FSE | 2018 | [link](https://arxiv.org/pdf/1806.05837.pdf) | | | |
| VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary | ASE | 2018 | [link](https://dl.acm.org/doi/10.1145/3238147.3240480) | | | [link](https://github.com/buptsseGJ/VulSeeker) |
| VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation | | 2018 | [link](https://dl.acm.org/doi/10.1145/3236024.3275524) | | | |
| FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware | | 2018 | [link](https://dl.acm.org/doi/10.1145/3296957.3177157) | | | |
| BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices | | 2018 | [link](https://users.encs.concordia.ca/~wang/papers/dimva18paria.pdf) | | | |
| A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries | | 2018 | [link](https://dl.acm.org/doi/10.1145/3175492) | | | |
| Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis | | 2018 | [link](https://dl.acm.org/doi/10.1145/3176258.3176306) | [link](https://pagabuc.me/slides/codaspy18_pagani.slides.pdf) | | |
| BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering | ASIA CCS | 2018 | [link](https://dl.acm.org/doi/10.1145/3196494.3196504) | | | |
| A Deep Learning Approach to Program Similarity | MASES | 2018 | [link](https://dl.acm.org/doi/10.1145/3243127.3243131) | | | |
| Recurrent Neural Network for Code Clone Detection | SEIM | 2018 | [link](https://seim-conf.org/media/materials/2018/proceedings/SEIM-2018_Short_Papers.pdf#page=48) | | | |
| The Adverse Effects of Code Duplication in Machine Learning Models of Code | | 2018 | [link](https://dl.acm.org/doi/10.1145/3359591.3359735) | | [link](https://www.youtube.com/watch?v=uvWfpE2LhOo) | |
| Benchmarks for software clone detection: A ten-year retrospective | SANER | 2018 | [link](https://ieeexplore.ieee.org/document/8330194) | | | |
| Binary Code Clone Detection across Architectures and Compiling Configurations | ICPC | 2017 | [link](https://dl.acm.org/doi/10.1109/ICPC.2017.22) | | | |
| Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection | ACM CCS | 2017 | [link](https://arxiv.org/pdf/1708.06525.pdf) | | | [link](https://github.com/Yunlongs/Genimi) |
| BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection | ASIA CCS | 2017 | [link](https://dl.acm.org/doi/10.1145/3052973.3052974) | | | |
| BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape | DIMVA | 2017 | [link](https://link.springer.com/chapter/10.1007/978-3-319-60876-1_14) | | | |
| Compiler-agnostic function detection in binaries | IEEE EuroS&P | 2017 | [link](https://ieeexplore.ieee.org/document/7961979) | | | [link](https://github.com/uxmal/nucleus) |
| BinSign: Fingerprinting binary functions to support automated analysis of code executables | | 2017 | [link](https://spectrum.library.concordia.ca/982206/1/Nouh_MASc_S2017.pdf) | | | |
| Similarity of binaries through re-optimization | PLDI | 2017 | [link](https://dl.acm.org/doi/10.1145/3062341.3062387) | [link](https://nimrodpar.github.io/assets/presentations/gitz-pldi17.pdf) | | |
| Transferring code-clone detection and analysis to practice | ICSE-SEIP | 2017 | [link](https://dl.acm.org/doi/10.1109/ICSE-SEIP.2017.6) | | | |
| Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping | IEEE S&P | 2017 | [link](https://ieeexplore.ieee.org/document/7958617) | | | |
| Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code | IJCAI | 2017 | [link](https://www.ijcai.org/Proceedings/2017/0423.pdf) | | | |
| Extracting Conditional Formulas for Cross-Platform Bug Search | ASIA CCS | 2017 | [link](https://dl.acm.org/doi/10.1145/3052973.3052995) | | | |
| SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills | ICSE | 2017 | [link](https://ieeexplore.ieee.org/document/7985685) | | | |
| CCLearner: A Deep Learning-Based Clone Detection Approach | | 2017 | [link](http://people.cs.vt.edu/nm8247/publications/icsme-research-118-camera-ready.pdf) | | | [link](https://github.com/liuqingli/CCLearner) |
| BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking | USENIX | 2017 | [link](https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-ming.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/usenixsecurity17_slides_jiang_ming.pdf) | [link](https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/ming) | |
| In-memory Fuzzing for Binary Code Similarity Analysis | ASE | 2017 | [link](https://dl.acm.org/doi/10.5555/3155562.3155606) | | | |
| DéjàVu: a map of code duplicates on GitHub | OOPSLA | 2017 | [link](https://dl.acm.org/doi/10.1145/3133908) | | | |
| Some from Here, Some from There: Cross-project Code Reuse in GitHub | MSR | 2017 | [link](https://dl.acm.org/doi/10.1109/MSR.2017.15) | | | |
| CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph | | 2017 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) | | | |
| Identifying Functionally Similar Code in Complex Codebases | ICPC | 2016 | [link](http://www.cs.columbia.edu/~simha/preprint_icpc16.pdf) | | | [link](https://github.com/Programming-Systems-Lab/ioclones) |
| Scalable graph-based bug search for firmware images (Genius) | ASM CCS | 2016 | [link](https://www.cs.ucr.edu/~heng/pubs/genius-ccs16.pdf) | | [link](https://www.youtube.com/watch?v=R9TPqflLGNs) | [link](https://github.com/qian-feng/Gencoding) |
| Cross-Architecture Binary Semantics Understanding via Similar Code Comparison | IEEE SANER | 2016 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/saner16.pdf) | | | |
| discovRE: Efficient cross-architecture identification of bugs in binary code | NDSS | 2016 | [link](https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan/discovre_ndss2016.pdf) | | | |
| BinGo: Cross-architecture cross-OS Binary Search | FSE | 2016 | [link](https://dl.acm.org/doi/10.1145/2950290.2950350) | | | |
| Kam1n0: Mapreduce-based assembly clone search for reverse engineering | KDD | 2016 | [link](https://dl.acm.org/doi/pdf/10.1145/2939672.2939719) | | | [link](https://github.com/McGill-DMaS/Kam1n0-Community) |
| Statistical similarity of binaries | PLDI | 2016 | [link](https://dl.acm.org/doi/10.1145/2980983.2908126) | [link](https://nimrodpar.github.io/assets/presentations/esh-pldi16.pdf) | | [link](https://github.com/tech-srl/esh) |
| Deep learning code fragments for code clone detection | ASE | 2016 | [link](https://ieeexplore.ieee.org/document/7582748) | | | |
| A Survey of Software Clone Detection Techniques | | 2016 | [link](https://pdfs.semanticscholar.org/8df3/d10963233aca0e7686b2818b0c47add5466d.pdf) | | | |
| SourcererCC: Scaling Code Clone Detection to Big Code | ICSE | 2016 | [link](https://arxiv.org/pdf/1512.06448.pdf) | | | |
| Binary executable file similarity calculation using function matching | | 2016 | [link](https://link.springer.com/article/10.1007/s11227-016-1941-2) | | | |
| Matching Similar Functions in Different Versions of a Malware | | 2016 | [link](https://ieeexplore.ieee.org/document/7846954) | | | |
| BinDNN: Resilient Function Matching Using Deep Learning | | 2016 | [link](http://patrickmcdaniel.org/pubs/securecomm16.pdf) | | | |
| VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis | ACSAC | 2016 | [link](https://dl.acm.org/doi/10.1145/2991079.2991102) | | | [link](https://github.com/vulpecker/Vulpecker) |
| BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench | | 2016 | [link](https://ieeexplore.ieee.org/document/7816515) | | | [link](https://github.com/jeffsvajlenko/BigCloneEval) |
| Cross-architecture bug search in binary executables | IEEE S&P | 2015 | [link](https://ieeexplore.ieee.org/document/7163056) | | | |
| Library functions identification in binary code by using graph isomorphism testings | | 2015 | [link](https://ieeexplore.ieee.org/document/7081836) | | | |
| Evaluating clone detection tools with BigCloneBench | | 2015 | [link](https://ieeexplore.ieee.org/document/7332459) | | | [link](https://github.com/clonebench/BigCloneBench) |
| Memoized semantics-based binary diffing with application to malware lineage inference | | 2015 | [link](https://faculty.ist.psu.edu/wu/papers/memoized-IFIP_SEC_2015.pdf) | | | |
| Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code | | 2015 | [link](https://www.dfrws.org/sites/default/files/session-files/paper-sigma_a_semantic_integrated_graph_matching_approach_for_identifying_reused_functions_in_binary_code.pdf) | [link](https://pdfs.semanticscholar.org/a036/ff11b1a675550ac57949bc540f400e8fa695.pdf) | | |
| BYTEWEIGHT: Learning to Recognize Functions in Binary Code | USENIX | 2014 | [link](https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-bao.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/sec14_slides_bao.pdf) | [link](https://www.usenix.org/node/184522) | |
| Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection | FSE | 2014 | [link](https://dl.acm.org/doi/10.1145/2635868.2635900) | | | |
| Binclone: Detecting code clones in malware | SERE | 2014 | [link](https://cradpdf.drdc-rddc.gc.ca/PDFS/unc194/p800686_A1b.pdf) | | | [link](https://github.com/BinSigma/BinClone) |
| Detecting fine-grained similarity in binaries | | 2014 | [link](https://web.cs.ucdavis.edu/~su/theses/AS-dissertation.pdf) | | | |
| Leveraging semantic signatures for bug search in binary programs | ACSAC | 2014 | [link](https://dl.acm.org/doi/10.1145/2664243.2664269) | | | |
| How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors | | 2014 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.685.7674&rep=rep1&type=pdf) | | | |
| Tracelet-based code search in executables | PLDI | 2014 | [link](https://dl.acm.org/doi/10.1145/2594291.2594343) | | | |
| Control Flow-Based Malware Variant Detection | | 2014 | [link](https://ieeexplore.ieee.org/document/6601601) | | | |
| Hashing for Similarity Search: A Survey | | 2014 | [link](https://arxiv.org/pdf/1408.2927.pdf) | | | |
| Achieving accuracy and scalability simultaneously in detecting application clones on android markets | ICSE | 2014 | [link](https://dl.acm.org/doi/10.1145/2568225.2568286) | | | |
| Identifying Shared Software Components to Support Malware Forensics | | 2014 | [link](https://link.springer.com/chapter/10.1007/978-3-319-08509-8_2) | | | |
| Evaluating Modern Clone Detection Tools | | 2014 | [link](https://ieeexplore.ieee.org/document/6976098) | | | |
| Rendezvous: a search engine for binary code | MSR | 2013 | [link](https://dl.acm.org/doi/10.5555/2487085.2487147) | | | |
| Binslayer: accurate comparison of binary executables | PPREW | 2013 | [link](https://dl.acm.org/doi/10.1145/2430553.2430557) | | | [link](https://github.com/MartialB/BinSlayer) |
| Software clone detection: A systematic review | | 2013 | [link](https://romisatriawahono.net/lecture/rm/survey/software%20engineering/Software%20Construction/Rattan%20-%20Software%20Clone%20Detection%20-%202013.pdf) | | | |
| How to extract differences from similar programs? A cohesion metric approach | | 2013 | [link](https://ieeexplore.ieee.org/document/6613038) | | | |
| Software clone detection and refactoring | | 2013 | [link](https://www.researchgate.net/publication/258389603_Software_Clone_Detection_and_Refactoring) | | | |
| An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code | | 2013 | [link](http://ijarcsse.com/Before_August_2017/docs/papers/Volume_3/5_May2013/V3I5-0355.pdf) | | | |
| A hybrid-token and textual based approach to find similar code segments | | 2013 | [link](https://ieeexplore.ieee.org/document/6726700) | | | |
| Gapped code clone detection with lightweight source code analysis | | 2013 | [link](https://ieeexplore.ieee.org/abstract/document/6613837) | | | |
| MutantX-S: Scalable Malware Clustering Based on Static Features | USENIX | 2013 | [link](https://www.usenix.org/system/files/conference/atc13/atc13-hu.pdf) | | [link](https://www.usenix.org/node/174525) | |
| Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice | PPREW | 2013 | [link](https://dl.acm.org/doi/10.1145/2430553.2430558) | | | |
| Towards Automatic Software Lineage Inference | USENIX | 2013 | [link](https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_jang.pdf) | | [link](https://www.usenix.org/conference/usenixsecurity13/technical-sessions/papers/jang) | |
| AnDarwin: Scalable Detection of Semantically Similar Android Applications | | 2013 | [link](https://ieeexplore.ieee.org/document/6985631) | | | |
| Expose: Discovering potential binary code re-use | | 2013 | [link](https://ieeexplore.ieee.org/document/6649873) | | | |
| Function Matching-based Binary level Software Similarity Calculation | RACS | 2013 | [link](https://dl.acm.org/doi/10.1145/2513228.2513300) | | | |
| FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors | RAID | 2013 | [link](https://software.imdea.org/~juanca/papers/firma_raid13.pdf) | | | |
| A study of repetitiveness of code changes in software evolution | ASE | 2013 | [link](https://dl.acm.org/doi/10.1109/ASE.2013.6693078) | | | |
| ibinhunt: Binary hunting with interprocedural control flow | | 2012 | [link](https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2699&context=sis_research) | [link](https://slideplayer.com/slide/4168742/) | | |
| ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions | USENIX | 2012 | [link](https://users.ece.cmu.edu/~jiyongj/papers/oakland12.pdf) | | | |
| Boreas: an accurate and scalable token-based approach to code clone detection | ASE | 2012 | [link](https://dl.acm.org/doi/10.1145/2351676.2351725) | | | |
| Folding Repeated Instructions for Improving Token-Based Code Clone Detection | | 2012 | [link](https://ieeexplore.ieee.org/document/6392103) | | | |
| A metrics-based data mining approach for software clone detection | | 2012 | [link](https://ieeexplore.ieee.org/document/6340252) | | | |
| Comparison of Clone Detection Techniques | | 2012 | | | | |
| Malware Classification Method via Binary Content Comparison | RACS | 2012 | [link](https://dl.acm.org/doi/10.1145/2401603.2401672) | | | |
| Binary function clustering using semantic hashes | ICMLA | 2012 | [link](https://ieeexplore.ieee.org/document/6406693) | | | |
| Value-based program characterization and its application to software plagiarism detection | | 2011 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.9508&rep=rep1&type=pdf) | | | |
| CMCD: Count Matrix Based Code Clone Detection | | 2011 | [link](https://ieeexplore.ieee.org/iel5/6129717/6130641/06130694.pdf) | | | |
| Incremental code clone detection: A pdg-based approach | | 2011 | [link](https://ieeexplore.ieee.org/document/6079769) | | | |
| Anywhere, Any-Time Binary Instrumentation | | 2011 | [link](https://dl.acm.org/doi/10.1145/2024569.2024572) | | | |
| Code reuse in open source software development: Quantitative evidence, drivers, and impediments | | 2010 | | | | |
| Index-based code clone detection: incremental, distributed, scalable | | 2010 | | | | |
| Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics | | 2010 | | | | |
| Ghezzi, A hybrid approach (syntactic and textual) to clone detection | | 2010 | | | | |
| Evaluating code clone genealogies at release level: An empirical study | | 2010 | | | | |
| A survey of Binary similarity and distance measures | | 2010 | | | | |
| Idea: Opcode-Sequence-Based Malware Detection | | 2010 | | | | |
| Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces | USENIX | 2010 | | | | |
| Data fingerprinting with similarity digests | | 2010 | | | | |
| Automatic mining of functionally equivalent code fragments via random testing | | 2009 | | | | |
| A mutation/injection-based automatic framework for evaluating code clone detection tools | | 2009 | | | | |
| Problematic code clones identification using multiple detection results | | 2009 | | | | |
| Incremental clone detection | | 2009 | | | | |
| Scalable and incremental clone detection for evolving software | | 2009 | | | | |
| Large-scale Malware Indexing Using Function-call Graphs | | 2009 | | | | |
| Scalable, Behavior-Based Malware Clustering | | 2009 | | | | |
| peHash: A Novel Approach to Fast Malware Clustering | USENIX | 2009 | | | | |
| Detecting Code Clones in Binary Executables | | 2009 | | | | |
| Binhunt: Automatically finding semantic differences in binary programs | | 2008 | | | | |
| Scalable detection of semantic clones | | 2008 | | | | |
| Deckard: Scalable and accurate tree-based detection of code clones | | 2007 | | | | |
| Large-scale code reuse in open source software | | 2007 | | | | |
| A survey on software clone detection research | | 2007 | [link](http://research.cs.queensu.ca/TechReports/Reports/2007-541.pdf) | | | |
| A study of consistent and inconsistent changes to code clones | | 2007 | | | | |
| Comparison and evaluation of clone detection tools | | 2007 | | | | |
| Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions | | 2007 | | | | |
| A Static Birthmark of Binary Executables Based on API Call Structure | | 2007 | | | | |
| CP-Miner: Finding copy-paste and related bugs in large-scale software code | | 2006 | | | | |
| Survey of research on software clones | | 2006 | [link](https://www.researchgate.net/publication/30815553_Survey_of_Research_on_Software_Clones) | | | |
| "Cloning considered harmful" considered harmful: patterns of cloning in software | | 2006 | [link](https://ieeexplore.ieee.org/document/4023973) | | | |
| GPLAG: detection of software plagiarism by program dependence graph analysis | | 2006 | | | | |
| Detecting Self-mutating Malware Using Control-flow Graph Matching | | 2006 | | | | |
| Identifying Almost Identical Files Using Context Triggered Piecewise Hashing | | 2006 | | | | |
| Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience | IEEE S&P | 2006 | | | | |
| Graph-based comparison of executable objects | | 2005 | | | | |
| SDD: high performance code clone detection system for large scale source code | | 2005 | [link](http://www.cs.cmu.edu/~seunghak/sdd_slee_2005.pdf) | | | |
| Polygraph: Automatically generating signatures for polymorphic worms | | 2005 | | | | |
| K-gram Based Software Birthmarks | | 2005 | | | | |
| Insights into System-Wide Code Duplication | IEEE | 2004 | [link](https://rmod.inria.fr/archives/papers/Rieg04bWCRE2004ClonesVisualization.pdf) | | | |
| Clone detection in source code by frequent itemset techniques | | 2004 | | | | |
| Evaluating clone detection techniques from a refactoring perspective | | 2004 | | | | |
| Structural comparison of executable objects | | 2004 | | | | |
| Code compaction of matching single-entry multiple-exit regions | | 2003 | [link](http://web.cs.ucla.edu/~palsberg/course/cs239/S04/papers/ChenLiGupta03.pdf) | | | |
| CloSpan: Mining: Closed sequential patterns in large datasets | | 2003 | | | | |
| Ccfinder: a multilinguistic token-based code clone detection system for large scale source code | | 2002 | | | | |
| Identifying similar code with program dependence graphs | | 2001 | | | | |
| Using slicing to identify duplication in source code | | 2001 | | | | |
| BMAT – A Binary Matching Tool for Stale Profile Propagation | | 2000 | | | | |
| A language independent approach for detecting duplicated code | | 1999 | | | | |
| Compressing Differences of Executable Code | | 1999 | | | | |
| Similarity search in high dimensions via hashing | | 1999 | | | | |
| Clone detection using abstract syntax trees | | 1998 | | | | |
| Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics | | 1996 | | | | |
| Pattern matching for clone and concept detection | | 1996 | | | | |
| On finding duplication and near-duplication in large software systems | | 1995 | [link](https://ieeexplore.ieee.org/document/514697) | | | |
| Detecting code similarity using patterns | | 1995 | | | | |
| A Cross-platform Binary Diff | | 1995 | | | | |