{"id":13729993,"url":"https://github.com/SystemSecurityStorm/Awesome-Binary-Similarity","last_synced_at":"2025-05-08T02:30:55.393Z","repository":{"id":37933708,"uuid":"265437908","full_name":"SystemSecurityStorm/Awesome-Binary-Similarity","owner":"SystemSecurityStorm","description":"An awesome \u0026 curated list of binary code similarity papers","archived":false,"fork":false,"pushed_at":"2024-11-26T15:54:51.000Z","size":99,"stargazers_count":561,"open_issues_count":1,"forks_count":77,"subscribers_count":40,"default_branch":"master","last_synced_at":"2025-04-30T18:01:51.500Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SystemSecurityStorm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-20T03:20:08.000Z","updated_at":"2025-04-27T09:53:18.000Z","dependencies_parsed_at":"2024-05-14T10:39:00.589Z","dependency_job_id":"768649ef-72da-455d-ae50-8f55be73078d","html_url":"https://github.com/SystemSecurityStorm/Awesome-Binary-Similarity","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SystemSecurityStorm%2FAwesome-Binary-Similarity","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SystemSecurityStorm%2FAwesome-Binary-Similarity/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SystemSecurityStorm%2FAwesome-Binary-Similarity/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SystemSecurityStorm%2FAwesome-Binary-Similarity/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SystemSecurityStorm","download_url":"https://codeload.github.com/SystemSecurityStorm/Awesome-Binary-Similarity/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252798832,"owners_count":21805882,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T02:01:08.582Z","updated_at":"2025-05-08T02:30:55.373Z","avatar_url":"https://github.com/SystemSecurityStorm.png","language":null,"readme":"# Awesome Binary Similarity\n\n|                            Title                             |    Venue     | Year |                            Paper                             |                            Slide                             |                            Video                             |                            Github                            |\n| :----------------------------------------------------------: | :----------: | :--: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |\n|Cross-Inlining Binary Function Similarity Detection| ICSE | 2024 | [Link](https://dl.acm.org/doi/abs/10.1145/3597503.3639080) | | | [link](https://github.com/island255/cross-inlining_binary_function_similarity)|\n| Improving ML-based Binary Function Similarity Detection by Assessing and Deprioritizing Control Flow Graph Features | Usenix | 2024 | [link](https://www.usenix.org/system/files/usenixsecurity24-wang-jialai.pdf) | | | |\n| BinaryAI: Binary Software Composition Analysis via Intelligent Binary Source Code Matching | ICSE | 2024 | [link](https://dl.acm.org/doi/10.1145/3597503.3639100) | | | |\n| Code is not Natural Language: Unlock the Power of Semantics-Oriented Graph Representation for Binary Code Similarity Detection | Usenix | 2024 | [link](https://www.usenix.org/system/files/sec24summer-prepub-346-he.pdf) | | | [link](https://github.com/NSSL-SJTU/HermesSim)\n| CLAP: Learning Transferable Binary Code Representations with Natural Language Supervision | ISSTA | 2024 | [link](https://arxiv.org/pdf/2402.16928.pdf) | | | [link](https://github.com/Hustcw/CLAP)|\n| CEBin: A Cost-Effective Framework for Large-Scale Binary Code Similarity Detection | ISSTA | 2024 | [link](https://arxiv.org/pdf/2402.18818.pdf) | | | [link](https://github.com/Hustcw/CEBin)|\n| FASER: Binary Code Similarity Search through the use of Intermediate Representations | CAMLIS | 2023 | [link](https://arxiv.org/pdf/2310.03605.pdf) | | [link](https://www.youtube.com/watch?v=d5SGeQbvG4o)| [link](https://github.com/br0kej/FASER)|\n|VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity | | 2023 | [link](https://arxiv.org/abs/2312.00507) | | | |\n| kTrans: Knowledge-Aware Transformer for Binary Code Embedding | | 2023 | [link](https://arxiv.org/abs/2308.12659)| | | [link](https://github.com/Learner0x5a/kTrans-release)| \n| Improving Binary Code Similarity Transformer Models by Semantics-Driven Instruction Deemphasis | ISSTA | 2023 |[link](https://dl.acm.org/doi/pdf/10.1145/3597926.3598121) | | | [link](https://zenodo.org/record/7978808)\n| Asteria-Pro: Enhancing Deep-Learning Based Binary Code Similarity Detection by Incorporating Domain Knowledge |TOSEM | 2023 | [link](https://dl.acm.org/doi/10.1145/3604611) | | | [link](https://github.com/Asteria-BCSD/Asteria-Pro)\n| sem2vec: Semantics-aware Assembly Tracelet Embedding | TOSEM | 2023 | [link](https://dl.acm.org/doi/10.1145/3569933) | | | [link](https://github.com/sem2vec) |\n| 1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis | TOSEM | 2023 | [link](https://dl.acm.org/doi/10.1145/3561385) | | | |\n| Binary Function Clone Search in the Presence of Code Obfuscation and Optimization over Multi-CPU Architectures | AsiaCCS | 2023 | [Link](https://dl.acm.org/doi/10.1145/3579856.3582818) | | | |\n|VulHawk: Cross-architecture Vulnerability Detection with Entropy-based Binary Code Search | NDSS | 2023 |[link](https://www.ndss-symposium.org/wp-content/uploads/2023/02/ndss2023_f415_paper.pdf) | | | [link](https://github.com/RazorMegrez/VulHawk)|\n| A Game-Based Framework to Compare Program Classifiers and Evaders | CGO | 2023 | [link](https://doi.org/10.1145/3579990.3580012) | [link](https://homepages.dcc.ufmg.br/~fernando/publications/papers/CGO23_ThaisDamasio.pdf) | [link](https://youtu.be/-fgG6agTWtI?feature=shared) | [link](https://github.com/lac-dcc/yali) | \n| BBDetector: A Precise and Scalable Third-Party Library Detection in Binary Executables with Fine-Grained Function-Level Features | MDPI | 2023 | [link](https://www.mdpi.com/2076-3417/13/1/413) | | | |\n| A Survey of Binary Code Fingerprinting Approaches: Taxonomy, Methodologies, and Features | CSUR | 2022 | [link](https://dl.acm.org/doi/10.1145/3486860) | | | |\n|Practical Binary Code Similarity Detection with BERT-based Transferable Similarity Learning | ACSAC | 2022 | [link](https://dl.acm.org/doi/abs/10.1145/3564625.3567975)| [link](https://www.acsac.org/2022/program/papers/76-Ahn-Software_Security_I.pdf)| | [link](https://github.com/asw0316/binshot)| \n|Improving cross-platform binary analysis using representation learning via graph alignment | ISSTA | 2022 | [link](https://dl.acm.org/doi/pdf/10.1145/3533767.3534383)| | [link](https://www.youtube.com/watch?v=rK1CDMauaZU\u0026t=89s) | [link](https://github.com/yonsei-cysec/XBA)| \n|jTrans: Jump-Aware Transformer for Binary Code Similarity | ISSTA | 2022 | [link](https://arxiv.org/pdf/2205.12713.pdf)| | [link](https://www.youtube.com/watch?v=rAirmnUsC1k) | [link](https://github.com/vul337/jTrans/)| \n|COBRA-GCN: Contrastive Learning to Optimize Binary Representation Analysis with Graph Convolutional Networks | DIMVA | 2022 | [link](https://dl.acm.org/doi/abs/10.1007/978-3-031-09484-2_4)| |  | | \n|A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware | ISSTA | 2022 | [link](https://doi.org/10.1145/3533767.3534366)| | [link](https://www.youtube.com/watch?v=H2o45YRguMM) | [link](https://github.com/BBge/FirmSecDataset)| \n|How Machine Learning Is Solving the Binary Function Similarity Problem | Usenix | 2022 | [link](https://www.s3.eurecom.fr/docs/usenixsec22_marcelli.pdf)| |[link](https://www.youtube.com/watch?v=e9bab7GpwnI) | [link](https://github.com/Cisco-Talos/binary_function_similarity)| \n|Enhancing DNN-Based Binary Code Function Search With Low-Cost Equivalence Checking | TSE | 2022 | [link](https://ieeexplore.ieee.org/document/9707874)| | | [link](https://github.com/computer-analysis/BinUSE)| \n| Program Representations for Predictive Compilation: State of Affairs in the Early 20's | COLA | 2022 | [link](https://doi.org/10.1016/j.cola.2022.101171) | [link](https://homepages.dcc.ufmg.br/~fernando/publications/papers/FaustinoJCL22.pdf) | | [link](https://github.com/otavioon/COLA-2022-Tools) | \n| Improving binary diffing speed and accuracy using community detection and locality-sensitive hashing: an empirical study | JCVHT | 2022 | [link](https://link.springer.com/article/10.1007/s11416-022-00452-z) | | | |\n| PalmTree: Learning an Assembly Language Model for Instruction Embedding | CCS | 2021 | [link](https://dl.acm.org/doi/abs/10.1145/3460120.3484587) | [link](https://www.inforsec.org/wp/wp-content/uploads/2021/07/qy.pdf) | | [link](https://github.com/palmtreemodel/PalmTree) | \n| Binary code similarity detection | ASE | 2021 | [link](https://dl.acm.org/doi/abs/10.1109/ASE51524.2021.9678518)| | | |\n| Binary diffing as a network alignment problem via belief propagation | ASE | 2021 | [link](https://basepub.dauphine.psl.eu/bitstream/handle/123456789/22755/menginrossi2021binary-diffing.pdf?sequence=2)| | | |\n| Asteria: Deep Learning-based AST-Encoding for Cross-platform Binary Code Similarity Detection | IEEE DSN 2021 | 2021 | [link](https://arxiv.org/pdf/2108.06082v1.pdf)| | | [link](https://github.com/Asteria-BCSD/Asteria)|\n| BinDeep: A deep learning approach to binary code similarity detection | ESWA | 2021 | [link](https://www.sciencedirect.com/science/article/pii/S0957417420310332)| | | |\n|EnBinDiff: Identifying Data-Only Patches for Binaries | TDSC | 2021 | [link](https://ieeexplore.ieee.org/document/9645381)| | | | \n|BinDiff\u003csub\u003eNN\u003c/sub\u003e: Learning Distributed Representation of Assembly for Robust Binary Diffing Against Semantic Differences | TSE | 2021 | [link](https://ieeexplore.ieee.org/document/9470904)| | | [link](https://github.com/sami2316/bindiff_NN)| \n| Codee: A Tensor Embedding Scheme for Binary Code Search | TSE | 2021 |[link](https://ieeexplore.ieee.org/document/9345532) | | | [link](https://github.com/ycachy/Codee)|\n| Revisiting Binary Code Similarity Analysis using Interpretable Feature Engineering and Lessons Learned | TSE(revision) | 2021 | [link](https://arxiv.org/pdf/2011.10749.pdf) |  |  | [link](https://github.com/SoftSec-KAIST/TikNib) |\n| How could Neural Networks understand Programs? | ICML 2021             | 2021 |         [link](https://arxiv.org/pdf/2105.04297.pdf)         |                                                              |          [link](https://github.com/pdlan/OSCAR)                                                    ||\n| Multi-threshold token-based code clone detection | SANER 2021             | 2021 |         [link](https://arxiv.org/pdf/2002.05204.pdf)         |                                                              |                                                              ||\n| FastSpec: Scalable Generation and Detection of Spectre Gadgets Using Neural Embeddings | IEEE Euro S\u0026P 2021             | 2021 |         [link](https://arxiv.org/pdf/2006.14147.pdf)         |                                                              | [link](https://www.youtube.com/watch?v=WskRnEY7oCs) |                                                  [link](https://github.com/vernamlab/FastSpec)           |\n| TREX: Learning Execution Semantics from Micro-Traces for Binary Similarity |              | 2020 |         [link](https://arxiv.org/pdf/2012.08680.pdf)         |                                                              |                                                              | [link](https://github.com/CUMLSec/trex) |\n| Similarity of Binaries Across Optimization Levels and Obfuscation | ESORICS 2020 | 2020 | [link](https://books.google.com.hk/books?id=sqT8DwAAQBAJ\u0026pg=PA295\u0026lpg=PA295\u0026dq=Similarity+of+Binaries+Across+Optimization+Levels+and+Obfuscation\u0026source=bl\u0026ots=OFw-NpBFEJ\u0026sig=ACfU3U2DFjxq5lFEM2smLXvWRNf8dyX-TQ\u0026hl=en\u0026sa=X\u0026ved=2ahUKEwiZvuKSk93yAhXCB94KHYNeA_YQ6AF6BAgPEAM#v=onepage\u0026q=Similarity%20of%20Binaries%20Across%20Optimization%20Levels%20and%20Obfuscation\u0026f=false) |                                                              | [link](https://www.youtube.com/watch?v=Pi7wsCvfBa8) |                                                              |\n| Open-source tools and benchmarks for code-clone detection: past, present, and future trends |              | 2020 |  [link](https://dl.acm.org/doi/abs/10.1145/3381307.3381310)  |                                                              |                                                              |                                                              |\n| Semantically Find Similar Binary Codes with Mixed Key Instruction Sequence |              | 2020 |                                                              |                                                              |                                                              |                                                              |\n| LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code |              | 2020 |     [link](https://ieeexplore.ieee.org/document/9054845)     |                                                              |                                                              |                                                              |\n| Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree |    SANER     | 2020 |         [link](https://arxiv.org/pdf/2002.08653.pdf)         |                                                              |                                                              |                                                              |\n| What You See is What it Means! Semantic Representation Learning of Code based on Visualization and Transfer Learning |              | 2020 |         [link](https://arxiv.org/pdf/2002.02650.pdf)         |                                                              |                                                              |                                                              |\n|           Clone Detection on Large Scala Codebases           |              | 2020 |     [link](https://ieeexplore.ieee.org/document/9047640)     |                                                              |                                                              |                                                              |\n|     CloneCompass: Visualizations for Code Clone Analysis     |              | 2020 | [link](https://dspace.library.uvic.ca/bitstream/handle/1828/11729/Ying_Wang_MSc_2020.pdf?sequence=1\u0026isAllowed=y) |                                                              |                                                              |                                                              |\n| DEEPBINDIFF: Learning Program-Wide Code Representations for Binary Diffing |     NDSS     | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/2020/02/24311.pdf) |                                                              |     [link](https://www.youtube.com/watch?v=TB50csOprMs)      |        [link](https://github.com/yueduan/DeepBinDiff)        |\n| VGraph: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets |   EuroS\u0026P    | 2020 | [link](https://www2.seas.gwu.edu/~howie/publications/VGraph-EuroSP20.pdf) |                                                              |                                                              |                                                              |\n| Order Matters: Semantic-Aware Neural Networks for Binary Code Similarity Detection |     AAAI     | 2020 | [link](https://keenlab.tencent.com/en/whitepapers/Ordermatters.pdf) |                                                              |                                                              |                                                              |\n| Similarity Metric Method for Binary Basic Blocks of Cross-Instruction Set Architecture |     NDSS     | 2020 | [link](https://www.ndss-symposium.org/wp-content/uploads/bar2020-23002.pdf) |                                                              |                                                              |       [link](https://github.com/zhangxiaochuan/MIRROR)       |\n| Investigating Graph Embedding Neural Networks with Unsupervised Features Extraction for Binary Analysis | NDSS Workshop on Binary Analysis Research (BAR) | 2019 | [link](https://www.ndss-symposium.org/wp-content/uploads/bar2019_20_Massarelli_paper.pdf) | | | [link](https://github.com/lucamassarelli/Unsupervised-Features-Learning-For-Binary-Similarity) |\n| Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization |   IEEE S\u0026P   | 2019 | [link](https://www.computer.org/csdl/proceedings-article/sp/2019/666000a038/19skfc3ZfKo) | [link](https://pdfs.semanticscholar.org/38ae/cd9be307867e375b17597499e3e8be2d4930.pdf) | [link](https://www.youtube.com/watch?v=6ethsho5uJA\u0026feature=emb_title) |                                                              |\n| Semantic-Based Representation Binary Clone Detection for Cross-Architectures in the Internet of Things |     MDPI     | 2019 |     [link](https://www.mdpi.com/2076-3417/9/16/3283/pdf)     |                                                              |                                                              |                                                              |\n|              A Survey of Binary Code Similarity              |     CSUR     | 2019 |         [link](https://arxiv.org/pdf/1909.11424.pdf)         |                                                              |                                                              |                                                              |\n|                     代码克隆检测研究进展                     |   软件学报   | 2019 |  [link](https://xin-xia.github.io/publication/rjxb181.pdf)   |                                                              |                                                              |                                                              |\n|         A Systematic Review on Code Clone Detection          |              | 2019 |     [link](https://ieeexplore.ieee.org/document/8719895)     |                                                              |                                                              |                                                              |\n| A Cross-Architecture Instruction Embedding Model for Natural Language Processing-Inspired Binary Code Analysis |     NDSS     | 2019 |         [link](https://arxiv.org/pdf/1812.09652.pdf)         |                                                              |                                                              | [link](https://github.com/nlp-code-analysis/cross-arch-instr-model) |\n| Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs |     NDSS     | 2019 | [link](https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_11-4_Zuo_paper.pdf) | [link](https://www.ndss-symposium.org/wp-content/uploads/ndss2019_11-4_Zuo_slides.pdf) | [link](https://www.youtube.com/watch?v=-BeqwMPQNrw\u0026list=PLfUWWM-POgQvnPOa9Bo1AyKplMkOGfHUT\u0026index=5\u0026t=1s) |           [model](https://nmt4binaries.github.io/)           |\n| SAFE: Self-Attentive Function Embeddings for Binary Similarity |              | 2019 |         [link](https://arxiv.org/pdf/1811.05296.pdf)         | [link](https://www.dimva2019.org/wp-content/uploads/sites/31/2019/06/DIMVA19-Slides-22.pdf) |                                                              |           [link](https://github.com/gadiluna/SAFE)           |\n| Learning-Based Recursive Aggregation of Abstract Syntax Trees for Code Clone Detection |    SANER     | 2019 |     [link](https://ieeexplore.ieee.org/document/8668039)     |                                                              |                                                              |                                                              |\n|            基于深度学习的跨平台二进制代码关联分析            |              | 2019 | [link](https://kns.cnki.net/KCMS/detail/detail.aspx?dbname=CMFD202001\u0026filename=1019646524.nh) |                                                              |                                                              |                                                              |\n| CVSkSA: cross-architecture vulnerability search in firmware based on kNN-SVM and attributed control flow graph |              | 2019 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) |                                                              |                                                              |                                                              |\n| Function matching between binary executables: efﬁcient algorithms and features | JCVHT | 2019 | [link](https://users.auth.gr/kehagiat/Papers/journal/2019JCVHuku.pdf) | | | |\n| BinMatch: A Semantics-based Hybrid Approach on Binary Code Clone Analysis |    ICSME     | 2018 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/icsme18.pdf) |                                                              |                                                              |                                                              |\n| αDiff: Cross-Version Binary Code Similarity Detection with DNN |     ASE      | 2018 | [link](https://dl.acm.org/doi/pdf/10.1145/3238147.3238199?download=true) |                                                              |                                                              |  [dataset](https://github.com/twelveand0/alphadiff-dataset)  |\n|      Binary Similarity Detection Using Machine Learning      |     PLDI     | 2018 |    [link](https://dl.acm.org/doi/10.1145/3264820.3264821)    |                                                              |                                                              |                                                              |\n|      CCAligner: A Token Based Large-Gap Clone Detector       | ICSE | 2018 | [link](http://home.ustc.edu.cn/~wpc520/papers/CCAligner.pdf) |                                                              |                                                              |                                                              |\n|        Oreo: Detection of Clones in the Twilight Zone        |     FSE      | 2018 |         [link](https://arxiv.org/pdf/1806.05837.pdf)         |                                                              |                                                              |                                                              |\n| VulSeeker: A Semantic Learning Based Vulnerability Seeker for Cross-platform Binary |     ASE      | 2018 |    [link](https://dl.acm.org/doi/10.1145/3238147.3240480)    |                                                              |                                                              |        [link](https://github.com/buptsseGJ/VulSeeker)        |\n| VulSeeker-pro: enhanced semantic learning based binary vulnerability seeker with emulation |              | 2018 |    [link](https://dl.acm.org/doi/10.1145/3236024.3275524)    |                                                              |                                                              |                                                              |\n| FirmUp: Precise Static Detection of Common Vulnerabilities in Firmware |              | 2018 |    [link](https://dl.acm.org/doi/10.1145/3296957.3177157)    |                                                              |                                                              |                                                              |\n| BINARM: Scalable and Efficient Detection of Vulnerabilities in Firmware Images of Intelligent Electronic Devices |              | 2018 | [link](https://users.encs.concordia.ca/~wang/papers/dimva18paria.pdf) |                                                              |                                                              |                                                              |\n| A Resilient and Efficient System for Identifying FOSS Functions in Malware Binaries |              | 2018 |        [link](https://dl.acm.org/doi/10.1145/3175492)        |                                                              |                                                              |                                                              |\n| Beyond Precision and Recall: Understanding Uses (and Misuses) of Similarity Hashes in Binary Analysis |              | 2018 |    [link](https://dl.acm.org/doi/10.1145/3176258.3176306)    | [link](https://pagabuc.me/slides/codaspy18_pagani.slides.pdf) |                                                              |                                                              |\n| BCD: Decomposing Binary Code Into Components Using Graph-Based Clustering |   ASIA CCS   | 2018 |    [link](https://dl.acm.org/doi/10.1145/3196494.3196504)    |                                                              |                                                              |                                                              |\n|        A Deep Learning Approach to Program Similarity        |    MASES     | 2018 |    [link](https://dl.acm.org/doi/10.1145/3243127.3243131)    |                                                              |                                                              |                                                              |\n|      Recurrent Neural Network for Code Clone Detection       |     SEIM     | 2018 | [link](https://seim-conf.org/media/materials/2018/proceedings/SEIM-2018_Short_Papers.pdf#page=48) |                                                              |                                                              |                                                              |\n| The Adverse Effects of Code Duplication in Machine Learning Models of Code |              | 2018 |    [link](https://dl.acm.org/doi/10.1145/3359591.3359735)    |                                                              |     [link](https://www.youtube.com/watch?v=uvWfpE2LhOo)      |                                                              |\n| Benchmarks for software clone detection: A ten-year retrospective |    SANER     | 2018 |     [link](https://ieeexplore.ieee.org/document/8330194)     |                                                              |                                                              |                                                              |\n| Binary Code Clone Detection across Architectures and Compiling Configurations |     ICPC     | 2017 |     [link](https://dl.acm.org/doi/10.1109/ICPC.2017.22)      |                                                              |                                                              |                                                              |\n| Neural Network-based Graph Embedding for Cross-Platform Binary Code Similarity Detection |   ACM CCS    | 2017 |         [link](https://arxiv.org/pdf/1708.06525.pdf)         |                                                              |                                                              |                            [link](https://github.com/Yunlongs/Genimi)                                  |\n| BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection |   ASIA CCS   | 2017 |    [link](https://dl.acm.org/doi/10.1145/3052973.3052974)    |                                                              |                                                              |                                                              |\n| BinShape: Scalable and Robust Binary Library Function Identification Using Function Shape |    DIMVA     | 2017 | [link](https://link.springer.com/chapter/10.1007/978-3-319-60876-1_14) |                                                              |                                                              |                                                              |\n|       Compiler-agnostic function detection in binaries       | IEEE EuroS\u0026P | 2017 |     [link](https://ieeexplore.ieee.org/document/7961979)     |                                                              |                                                              |           [link](https://github.com/uxmal/nucleus)           |\n| BinSign: Fingerprinting binary functions to support automated analysis of code executables |              | 2017 | [link](https://spectrum.library.concordia.ca/982206/1/Nouh_MASc_S2017.pdf) |                                                              |                                                              |                                                              |\n|        Similarity of binaries through re-optimization        |     PLDI     | 2017 |    [link](https://dl.acm.org/doi/10.1145/3062341.3062387)    | [link](https://nimrodpar.github.io/assets/presentations/gitz-pldi17.pdf) |                                                              |                                                              |\n|  Transferring code-clone detection and analysis to practice  |  ICSE-SEIP   | 2017 |   [link](https://dl.acm.org/doi/10.1109/ICSE-SEIP.2017.6)    |                                                              |                                                              |                                                              |\n| Cryptographic Function Detection in Obfuscated Binaries via Bit-Precise Symbolic Loop Mapping |   IEEE S\u0026P   | 2017 |     [link](https://ieeexplore.ieee.org/document/7958617)     |                                                              |                                                              |                                                              |\n| Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code |    IJCAI     | 2017 |   [link](https://www.ijcai.org/Proceedings/2017/0423.pdf)    |                                                              |                                                              |                                                              |\n| Extracting Conditional Formulas for Cross-Platform Bug Search |   ASIA CCS   | 2017 |    [link](https://dl.acm.org/doi/10.1145/3052973.3052995)    |                                                              |                                                              |                                                              |\n| SPAIN: Security Patch Analysis for Binaries Towards Understanding the Pain and Pills |     ICSE     | 2017 |     [link](https://ieeexplore.ieee.org/document/7985685)     |                                                              |                                                              |                                                              |\n|  CCLearner: A Deep Learning-Based Clone Detection Approach   |              | 2017 | [link](http://people.cs.vt.edu/nm8247/publications/icsme-research-118-camera-ready.pdf) |                                                              |                                                              |        [link](https://github.com/liuqingli/CCLearner)        |\n| BinSim: Trace-based Semantic Binary Diffing via System Call Sliced Segment Equivalence Checking |    USENIX    | 2017 | [link](https://www.usenix.org/system/files/conference/usenixsecurity17/sec17-ming.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/usenixsecurity17_slides_jiang_ming.pdf) | [link](https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/ming) |                                                              |\n|    In-memory Fuzzing for Binary Code Similarity Analysis     |     ASE      | 2017 |    [link](https://dl.acm.org/doi/10.5555/3155562.3155606)    |                                                              |                                                              |                                                              |\n|          DéjàVu: a map of code duplicates on GitHub          |    OOPSLA    | 2017 |        [link](https://dl.acm.org/doi/10.1145/3133908)        |                                                              |                                                              |                                                              |\n| Some from Here, Some from There: Cross-project Code Reuse in GitHub |     MSR      | 2017 |      [link](https://dl.acm.org/doi/10.1109/MSR.2017.15)      |                                                              |                                                              |                                                              |\n| CVSSA: Cross-Architecture Vulnerability Search in Firmware Based on Support Vector Machine and Attributed Control Flow Graph |              | 2017 | [link](https://link.springer.com/article/10.1007/s11219-018-9435-5) |                                                              |                                                              |                                                              |\n| Identifying Functionally Similar Code in Complex Codebases | ICPC | 2016 | [link](http://www.cs.columbia.edu/~simha/preprint_icpc16.pdf) | | | [link](https://github.com/Programming-Systems-Lab/ioclones) |\n|     Scalable graph-based bug search for firmware images (Genius)     |   ASM CCS    | 2016 |  [link](https://www.cs.ucr.edu/~heng/pubs/genius-ccs16.pdf)  |                                                              |     [link](https://www.youtube.com/watch?v=R9TPqflLGNs)      |      [link](https://github.com/qian-feng/Gencoding)                                                        |\n| Cross-Architecture Binary Semantics Understanding via Similar Code Comparison |  IEEE SANER  | 2016 | [link](https://loccs.sjtu.edu.cn/~romangol/publications/saner16.pdf) |                                                              |                                                              |                                                              |\n| discovRE: Efficient cross-architecture identification of bugs in binary code |     NDSS     | 2016 | [link](https://net.cs.uni-bonn.de/fileadmin/ag/martini/Staff/yakdan/discovre_ndss2016.pdf) |                                                              |                                                              |                                                              |\n|       BinGo: Cross-architecture cross-OS Binary Search       |     FSE      | 2016 |    [link](https://dl.acm.org/doi/10.1145/2950290.2950350)    |                                                              |                                                              |                                                              |\n| Kam1n0: Mapreduce-based assembly clone search for reverse engineering |     KDD      | 2016 |  [link](https://dl.acm.org/doi/pdf/10.1145/2939672.2939719)  |                                                              |                                                              |   [link](https://github.com/McGill-DMaS/Kam1n0-Community)    |\n|              Statistical similarity of binaries              |     PLDI     | 2016 |    [link](https://dl.acm.org/doi/10.1145/2980983.2908126)    | [link](https://nimrodpar.github.io/assets/presentations/esh-pldi16.pdf) |                                                              |           [link](https://github.com/tech-srl/esh)            |\n|    Deep learning code fragments for code clone detection     |     ASE      | 2016 |     [link](https://ieeexplore.ieee.org/document/7582748)     |                                                              |                                                              |                                                              |\n|       A Survey of Software Clone Detection Techniques        |              | 2016 | [link](https://pdfs.semanticscholar.org/8df3/d10963233aca0e7686b2818b0c47add5466d.pdf) |                                                              |                                                              |                                                              |\n|    SourcererCC: Scaling Code Clone Detection to Big Code     |     ICSE     | 2016 |         [link](https://arxiv.org/pdf/1512.06448.pdf)         |                                                              |                                                              |                                                              |\n| Binary executable file similarity calculation using function matching |              | 2016 | [link](https://link.springer.com/article/10.1007/s11227-016-1941-2) |                                                              |                                                              |                                                              |\n| Matching Similar Functions in Different Versions of a Malware |              | 2016 |     [link](https://ieeexplore.ieee.org/document/7846954)     |                                                              |                                                              |                                                              |\n|   BinDNN: Resilient Function Matching Using Deep Learning    |              | 2016 |   [link](http://patrickmcdaniel.org/pubs/securecomm16.pdf)   |                                                              |                                                              |                                                              |\n| VulPecker: An Automated Vulnerability Detection System Based on Code Similarity Analysis |    ACSAC     | 2016 |    [link](https://dl.acm.org/doi/10.1145/2991079.2991102)    |                                                              |                                                              |        [link](https://github.com/vulpecker/Vulpecker)        |\n| BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench |              | 2016 |     [link](https://ieeexplore.ieee.org/document/7816515)     |                                                              |                                                              |    [link](https://github.com/jeffsvajlenko/BigCloneEval)     |\n|     Cross-architecture bug search in binary executables      |   IEEE S\u0026P   | 2015 |     [link](https://ieeexplore.ieee.org/document/7163056)     |                                                              |                                                              |                                                              |\n| Library functions identification in binary code by using graph isomorphism testings |              | 2015 |     [link](https://ieeexplore.ieee.org/document/7081836)     |                                                              |                                                              |                                                              |\n|     Evaluating clone detection tools with BigCloneBench      |              | 2015 |     [link](https://ieeexplore.ieee.org/document/7332459)     |                                                              |                                                              |     [link](https://github.com/clonebench/BigCloneBench)      |\n| Memoized semantics-based binary diffing with application to malware lineage inference |              | 2015 | [link](https://faculty.ist.psu.edu/wu/papers/memoized-IFIP_SEC_2015.pdf) |                                                              |                                                              |                                                              |\n| Sigma: A semantic integrated graph matching approach for identifying reused functions in binary code |              | 2015 | [link](https://www.dfrws.org/sites/default/files/session-files/paper-sigma_a_semantic_integrated_graph_matching_approach_for_identifying_reused_functions_in_binary_code.pdf) | [link](https://pdfs.semanticscholar.org/a036/ff11b1a675550ac57949bc540f400e8fa695.pdf) |                                                              |                                                              |\n|  BYTEWEIGHT: Learning to Recognize Functions in Binary Code  |    USENIX    | 2014 | [link](https://www.usenix.org/system/files/conference/usenixsecurity14/sec14-paper-bao.pdf) | [link](https://www.usenix.org/sites/default/files/conference/protected-files/sec14_slides_bao.pdf) |          [link](https://www.usenix.org/node/184522)          |                                                              |\n| Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection |     FSE      | 2014 |    [link](https://dl.acm.org/doi/10.1145/2635868.2635900)    |                                                              |                                                              |                                                              |\n|          Binclone: Detecting code clones in malware          |     SERE     | 2014 | [link](https://cradpdf.drdc-rddc.gc.ca/PDFS/unc194/p800686_A1b.pdf) |                                                              |                                                              |         [link](https://github.com/BinSigma/BinClone)         |\n|        Detecting fine-grained similarity in binaries         |              | 2014 | [link](https://web.cs.ucdavis.edu/~su/theses/AS-dissertation.pdf) |                                                              |                                                              |                                                              |\n| Leveraging semantic signatures for bug search in binary programs |    ACSAC     | 2014 |    [link](https://dl.acm.org/doi/10.1145/2664243.2664269)    |                                                              |                                                              |                                                              |\n| How Accurate Is Coarse-grained Clone Detection?: Comparision with Fine-grained Detectors |              | 2014 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.685.7674\u0026rep=rep1\u0026type=pdf) |                                                              |                                                              |                                                              |\n|          Tracelet-based code search in executables           |     PLDI     | 2014 |    [link](https://dl.acm.org/doi/10.1145/2594291.2594343)    |                                                              |                                                              |                                                              |\n|         Control Flow-Based Malware Variant Detection         |              | 2014 |     [link](https://ieeexplore.ieee.org/document/6601601)     |                                                              |                                                              |                                                              |\n|           Hashing for Similarity Search: A Survey            |              | 2014 |         [link](https://arxiv.org/pdf/1408.2927.pdf)          |                                                              |                                                              |                                                              |\n| Achieving accuracy and scalability simultaneously in detecting application clones on android markets |     ICSE     | 2014 |    [link](https://dl.acm.org/doi/10.1145/2568225.2568286)    |                                                              |                                                              |                                                              |\n| Identifying Shared Software Components to Support Malware Forensics |              | 2014 | [link](https://link.springer.com/chapter/10.1007/978-3-319-08509-8_2) |                                                              |                                                              |                                                              |\n|           Evaluating Modern Clone Detection Tools            |              | 2014 |     [link](https://ieeexplore.ieee.org/document/6976098)     |                                                              |                                                              |                                                              |\n|         Rendezvous: a search engine for binary code          |     MSR      | 2013 |    [link](https://dl.acm.org/doi/10.5555/2487085.2487147)    |                                                              |                                                              |                                                              |\n|     Binslayer: accurate comparison of binary executables     |    PPREW     | 2013 |    [link](https://dl.acm.org/doi/10.1145/2430553.2430557)    |                                                              |                                                              |       [link](https://github.com/MartialB/BinSlayer)                                                       |\n|        Software clone detection: A systematic review         |              | 2013 | [link](https://romisatriawahono.net/lecture/rm/survey/software%20engineering/Software%20Construction/Rattan%20-%20Software%20Clone%20Detection%20-%202013.pdf) |                                                              |                                                              |                                                              |\n| How to extract differences from similar programs? A cohesion metric approach |              | 2013 |     [link](https://ieeexplore.ieee.org/document/6613038)     |                                                              |                                                              |                                                              |\n|           Software clone detection and refactoring           |              | 2013 | [link](https://www.researchgate.net/publication/258389603_Software_Clone_Detection_and_Refactoring) |                                                              |                                                              |                                                              |\n| An Emerging Approach towards Code Clone Detection: Metric Based Approach on Byte Code |              | 2013 | [link](http://ijarcsse.com/Before_August_2017/docs/papers/Volume_3/5_May2013/V3I5-0355.pdf) |                                                              |                                                              |                                                              |\n| A hybrid-token and textual based approach to find similar code segments |              | 2013 |     [link](https://ieeexplore.ieee.org/document/6726700)     |                                                              |                                                              |                                                              |\n| Gapped code clone detection with lightweight source code analysis |              | 2013 | [link](https://ieeexplore.ieee.org/abstract/document/6613837) |                                                              |                                                              |                                                              |\n| MutantX-S: Scalable Malware Clustering Based on Static Features |    USENIX    | 2013 | [link](https://www.usenix.org/system/files/conference/atc13/atc13-hu.pdf) |                                                              |          [link](https://www.usenix.org/node/174525)          |                                                              |\n| Binjuice: Fast Location of Similar Code Fragments Using Semantic Juice |    PPREW     | 2013 |    [link](https://dl.acm.org/doi/10.1145/2430553.2430558)    |                                                              |                                                              |                                                              |\n|         Towards Automatic Software Lineage Inference         |    USENIX    | 2013 | [link](https://www.usenix.org/system/files/conference/usenixsecurity13/sec13-paper_jang.pdf) |                                                              | [link](https://www.usenix.org/conference/usenixsecurity13/technical-sessions/papers/jang) |                                                              |\n| AnDarwin: Scalable Detection of Semantically Similar Android Applications |              | 2013 |     [link](https://ieeexplore.ieee.org/document/6985631)     |                                                              |                                                              |                                                              |\n|       Expose: Discovering potential binary code re-use       |              | 2013 |     [link](https://ieeexplore.ieee.org/document/6649873)     |                                                              |                                                              |                                                              |\n| Function Matching-based Binary level Software Similarity Calculation |     RACS     | 2013 |    [link](https://dl.acm.org/doi/10.1145/2513228.2513300)    |                                                              |                                                              |                                                              |\n| FIRMA: Malware Clustering and Network Signature Generation with Mixed Network Behaviors |     RAID     | 2013 | [link](https://software.imdea.org/~juanca/papers/firma_raid13.pdf) |                                                              |                                                              |                                                              |\n| A study of repetitiveness of code changes in software evolution |     ASE      | 2013 |   [link](https://dl.acm.org/doi/10.1109/ASE.2013.6693078)    |                                                              |                                                              |                                                              |\n|  ibinhunt: Binary hunting with interprocedural control flow  |              | 2012 | [link](https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=2699\u0026context=sis_research) |        [link](https://slideplayer.com/slide/4168742/)        |                                                              |                                                              |\n| ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions |    USENIX    | 2012 | [link](https://users.ece.cmu.edu/~jiyongj/papers/oakland12.pdf) |                                                              |                                                              |                                                              |\n| Boreas: an accurate and scalable token-based approach to code clone detection |     ASE      | 2012 |    [link](https://dl.acm.org/doi/10.1145/2351676.2351725)    |                                                              |                                                              |                                                              |\n| Folding Repeated Instructions for Improving Token-Based Code Clone Detection |              | 2012 |     [link](https://ieeexplore.ieee.org/document/6392103)     |                                                              |                                                              |                                                              |\n| A metrics-based data mining approach for software clone detection |              | 2012 |     [link](https://ieeexplore.ieee.org/document/6340252)     |                                                              |                                                              |                                                              |\n|           Comparison of Clone Detection Techniques           |              | 2012 |                                                              |                                                              |                                                              |                                                              |\n| Malware Classification Method via Binary Content Comparison  |     RACS     | 2012 |    [link](https://dl.acm.org/doi/10.1145/2401603.2401672)    |                                                              |                                                              |                                                              |\n|       Binary function clustering using semantic hashes       |    ICMLA     | 2012 |     [link](https://ieeexplore.ieee.org/document/6406693)     |                                                              |                                                              |                                                              |\n| Value-based program characterization and its application to software plagiarism detection |              | 2011 | [link](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.9508\u0026rep=rep1\u0026type=pdf) |                                                              |                                                              |                                                              |\n|        CMCD: Count Matrix Based Code Clone Detection         |              | 2011 | [link](https://ieeexplore.ieee.org/iel5/6129717/6130641/06130694.pdf) |                                                              |                                                              |                                                              |\n|    Incremental code clone detection: A pdg-based approach    |              | 2011 |     [link](https://ieeexplore.ieee.org/document/6079769)     |                                                              |                                                              |                                                              |\n|          Anywhere, Any-Time Binary Instrumentation           |              | 2011 |    [link](https://dl.acm.org/doi/10.1145/2024569.2024572)    |                                                              |                                                              |                                                              |\n| Code reuse in open source software development: Quantitative evidence, drivers, and impediments |              | 2010 |                                                              |                                                              |                                                              |                                                              |\n| Index-based code clone detection: incremental, distributed, scalable |              | 2010 |                                                              |                                                              |                                                              |                                                              |\n| Detection of Type-1 and Type-2 Code Clones Using Textual Analysis and Metrics |              | 2010 |                                                              |                                                              |                                                              |                                                              |\n| Ghezzi, A hybrid approach (syntactic and textual) to clone detection |              | 2010 |                                                              |                                                              |                                                              |                                                              |\n| Evaluating code clone genealogies at release level: An empirical study |              | 2010 |                                                              |                                                              |                                                              |                                                              |\n|     A survey of Binary similarity and distance measures      |              | 2010 |                                                              |                                                              |                                                              |                                                              |\n|        Idea: Opcode-Sequence-Based Malware Detection         |              | 2010 | [link](https://tarjomefa.com/wp-content/uploads/2015/12/4215-English.pdf)                                                             |                                                              |                                                              |                                                              |\n| Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces |    USENIX    | 2010 |                                                              |                                                              |                                                              |                                                              |\n|         Data fingerprinting with similarity digests          |              | 2010 |                                                              |                                                              |                                                              |                                                              |\n| Automatic mining of functionally equivalent code fragments via random testing |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n| A mutation/injection-based automatic framework for evaluating code clone detection tools |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n| Problematic code clones identification using multiple detection results |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n|                 Incremental clone detection                  |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n| Scalable and incremental clone detection for evolving software |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n|   Large-scale Malware Indexing Using Function-call Graphs    |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n|         Scalable, Behavior-Based Malware Clustering          |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n|     peHash: A Novel Approach to Fast Malware Clustering      |    USENIX    | 2009 |                                                              |                                                              |                                                              |                                                              |\n|         Detecting Code Clones in Binary Executables          |              | 2009 |                                                              |                                                              |                                                              |                                                              |\n| Binhunt: Automatically finding semantic differences in binary programs |              | 2008 | [link](https://people.eecs.berkeley.edu/~dawnsong/papers/2008%20binhunt_icics08.pdf)                             |                                                              |                                                              |                                                              |\n|            Scalable detection of semantic clones             |              | 2008 | [link](https://hiper.cis.udel.edu/lp/lib/exe/fetch.php/courses/icse08-gabel-detectclones.pdf)                                                             |                                                              |                                                              |                                                              |\n| Deckard: Scalable and accurate tree-based detection of code clones |              | 2007 |                                                              |                                                              |                                                              |                                                              |\n|        Large-scale code reuse in open source software        |              | 2007 |                                                              |                                                              |                                                              |                                                              |\n|        A survey on software clone detection research         |              | 2007 | [link](http://research.cs.queensu.ca/TechReports/Reports/2007-541.pdf) |                                                              |                                                              |                                                              |\n| A study of consistent and inconsistent changes to code clones |              | 2007 |                                                              |                                                              |                                                              |                                                              |\n|      Comparison and evaluation of clone detection tools      |              | 2007 |                                                              |                                                              |                                                              |                                                              |\n| Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions |              | 2007 |                                                              |                                                              |                                                              |                                                              |\n| A Static Birthmark of Binary Executables Based on API Call Structure |              | 2007 |                                                              |                                                              |                                                              |                                                              |\n| CP-Miner: Finding copy-paste and related bugs in large-scale software code |              | 2006 |                                                              |                                                              |                                                              |                                                              |\n|            Survey of research on software clones             |              | 2006 | [link](https://www.researchgate.net/publication/30815553_Survey_of_Research_on_Software_Clones) |                                                              |                                                              |                                                              |\n| \"Cloning considered harmful\" considered harmful: patterns of cloning in software |              | 2006 |     [link](https://ieeexplore.ieee.org/document/4023973)     |                                                              |                                                              |                                                              |\n| GPLAG: detection of software plagiarism by program dependence graph analysis |              | 2006 |                                                              |                                                              |                                                              |                                                              |\n| Detecting Self-mutating Malware Using Control-flow Graph Matching |              | 2006 |                                                              |                                                              |                                                              |                                                              |\n| Identifying Almost Identical Files Using Context Triggered Piecewise Hashing |              | 2006 |                                                              |                                                              |                                                              |                                                              |\n| Hamsa: Fast signature generation for zero-day polymorphic worms with provable attack resilience |   IEEE S\u0026P   | 2006 |                                                              |                                                              |                                                              |                                                              |\n|         Graph-based comparison of executable objects         |              | 2005 |                                                              |                                                              |                                                              |                                                              |\n| SDD: high performance code clone detection system for large scale source code |              | 2005 |  [link](http://www.cs.cmu.edu/~seunghak/sdd_slee_2005.pdf)   |                                                              |                                                              |                                                              |\n| Polygraph: Automatically generating signatures for polymorphic worms |              | 2005 |                                                              |                                                              |                                                              |                                                              |\n|               K-gram Based Software Birthmarks               |              | 2005 |                                                              |                                                              |                                                              |                                                              |\n|          Insights into System-Wide Code Duplication          |     IEEE     | 2004 | [link](https://rmod.inria.fr/archives/papers/Rieg04bWCRE2004ClonesVisualization.pdf) |                                                              |                                                              |                                                              |\n| Clone detection in source code by frequent itemset techniques |              | 2004 |                                                              |                                                              |                                                              |                                                              |\n| Evaluating clone detection techniques from a refactoring perspective |              | 2004 |                                                              |                                                              |                                                              |                                                              |\n|         Structural comparison of executable objects          |              | 2004 |                                                              |                                                              |                                                              |                                                              |\n| Code compaction of matching single-entry multiple-exit regions |              | 2003 | [link](http://web.cs.ucla.edu/~palsberg/course/cs239/S04/papers/ChenLiGupta03.pdf) |                                                              |                                                              |                                                              |\n| CloSpan: Mining: Closed sequential patterns in large datasets |              | 2003 |                                                              |                                                              |                                                              |                                                              |\n| Ccfinder: a multilinguistic token-based code clone detection system for large scale source code |              | 2002 |                                                              |                                                              |                                                              |                                                              |\n|   Identifying similar code with program dependence graphs    |              | 2001 |                                                              |                                                              |                                                              |                                                              |\n|     Using slicing to identify duplication in source code     |              | 2001 |                                                              |                                                              |                                                              |                                                              |\n| BMAT – A Binary Matching Tool for Stale Profile Propagation  |              | 2000 |                                                              |                                                              |                                                              |                                                              |\n| A language independent approach for detecting duplicated code |              | 1999 |                                                              |                                                              |                                                              |                                                              |\n|          Compressing Differences of Executable Code          |              | 1999 |                                                              |                                                              |                                                              |                                                              |\n|       Similarity search in high dimensions via hashing       |              | 1999 |                                                              |                                                              |                                                              |                                                              |\n|         Clone detection using abstract syntax trees          |              | 1998 |                                                              |                                                              |                                                              |                                                              |\n| Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics |              | 1996 |                                                              |                                                              |                                                              |                                                              |\n|       Pattern matching for clone and concept detection       |              | 1996 |                                                              |                                                              |                                                              |                                                              |\n| On finding duplication and near-duplication in large software systems |              | 1995 |     [link](https://ieeexplore.ieee.org/document/514697)      |                                                              |                                                              |                                                              |\n|           Detecting code similarity using patterns           |              | 1995 |                                                              |                                                              |                                                              |                                                              |\n|                 A Cross-platform Binary Diff                 |              | 1995 |                                                              |                                                              |                                                              |                                                              |\n\n","funding_links":[],"categories":["Papers"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSystemSecurityStorm%2FAwesome-Binary-Similarity","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSystemSecurityStorm%2FAwesome-Binary-Similarity","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSystemSecurityStorm%2FAwesome-Binary-Similarity/lists"}