{"id":13654797,"url":"https://github.com/clonebench/BigCloneBench","last_synced_at":"2025-04-23T10:31:33.073Z","repository":{"id":26634172,"uuid":"30089942","full_name":"clonebench/BigCloneBench","owner":"clonebench","description":null,"archived":false,"fork":false,"pushed_at":"2022-10-28T01:42:18.000Z","size":22,"stargazers_count":110,"open_issues_count":5,"forks_count":19,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-11-10T06:34:10.893Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clonebench.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-01-30T20:42:53.000Z","updated_at":"2024-11-07T12:03:44.000Z","dependencies_parsed_at":"2023-01-14T05:04:03.280Z","dependency_job_id":null,"html_url":"https://github.com/clonebench/BigCloneBench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clonebench%2FBigCloneBench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clonebench%2FBigCloneBench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clonebench%2FBigCloneBench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clonebench%2FBigCloneBench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clonebench","download_url":"https://codeload.github.com/clonebench/BigCloneBench/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250416381,"owners_count":21426982,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-02T03:00:47.349Z","updated_at":"2025-04-23T10:31:32.792Z","avatar_url":"https://github.com/clonebench.png","language":null,"funding_links":[],"categories":["Dataset and Benchmark"],"sub_categories":["Papers (This list is a bit outdated, need to update)"],"readme":"BigCloneBench\n=============\n\nBigCloneBench is a clone detection benchmark of known clones in the IJaDataset source repository.  The current version of the benchmark, IJaDataset (with modifications), and tools for measuring clone detection recall are available below.\n\nDerivative works using the benchmark should cite [1], while works using the recall measurement process should also cite [2] and [3].\n\nBigCloneEval\n============\nWe have now released BigCloneEval, a framework for evaluating clone detection tools with BigCloneBench.  It is very easy to use, and comes with optimized versions of BigCloneBench and IJaDatset for evaluating clone detection tools.  You can use this version if you simply want to measure the recall of your clone detection tool.  Use the full BigCloneBench database if you want to implemetn a custom experiment over the data.\n\nBigCloneEval can be found [here](http://jeffsvajlenko.weebly.com/bigcloneeval.html).\n\nBigCloneBench Version 2 (**Use This Version**)\n==========================================\n\nThe latest BigCloneBench is distributed with BigCloneEval [here](https://github.com/jeffsvajlenko/BigCloneEval).  It is significantly larger than the ERA version, and is distributed as a h2database file with a much simpler schema.  If you need the full BigCloneBench, with all the validation artifacts, please contact us and we can arrange to send you a copy (it is quite large).\n\nIJaDataset with the expanded files is available here: [IJaDataset 2.0 + BigCloneBench Samples](https://1drv.ms/u/s!AhXbM6MKt_yLj_tk29GJnc9BKoIvCg?e=oVTVJm).\n\nBigCloneBench full database (postgresql) is available here: [BigCloneBench_Postgresql](https://1drv.ms/u/s!AhXbM6MKt_yLj_tmo-UxK2QTwMk2ew?e=YF9w0g).\n\nLimits on OneDrive means for these large files it may be necissary to login before the files can be downloaded.\n\nERA (Outdated) \n=============\n\n** This version is out of date.  I reccomend using the 2nd version included with BigCloneEval for a much larger dataset. **\n\nThis is an updated version of the ERA release.  There are some minor changes to the database and IJaDataset to improve the quality of measured recall.  We have also released evaluation tools for measuring recall based on our work in [2].\n\n[BigCloneBench Database](https://1drv.ms/u/s!AhXbM6MKt_yLj_Nv7H-8OoPD45lWeg?e=5yORQL):\nThis is the benchmark database.\n    \n[IJaDataset 2.0 + BigCloneBench Samples](https://1drv.ms/u/s!AhXbM6MKt_yLj_N3FAIGw3CJb1JGOg?e=NsP59Z):\nThis is the full IJaDataset - inter-project java source-code dataset, including modifications for BigCloneBench.\n    \n[IJaDataset 2.0 - BigCloneBench Reduced Version](https://1drv.ms/u/s!AhXbM6MKt_yLj_Nsc798DA-UJ4DFew?e=fp157L):\nMost tools do not scale well to IJaDataset.  This version reduces the number of source files to only those which contain known true or false clones of the functionalities tagged in BigCloneBench.  This contains a folder per functionality with the source files containing tagged functions.  This reduces/removes the scalability challenge when measuring recall.  See instructions provided in the \"Clone Deteciton Recall Tools\" distributable for more details/suggestions.\n\nERA Version 1 (Outdated)\n==========================\n\n** This version is out of date.  I reccomend using the 2nd version included with BigCloneEval for a much larger dataset. **\n\nThis is a peer reviewed version of the benchmark, as described in our ICSME'14 ERA paper.  You should prefer to use the new version above, which is compatible with the clone detector evaluation tools.\n\n[BigCloneBench Database](https://1drv.ms/u/s!AhXbM6MKt_yLj_N9oMHZox6lUM7xqw?e=XVWQ4S)\n\n[IJaDataset 2.0 + BigCloneBench Samples](https://1drv.ms/u/s!AhXbM6MKt_yLj_N80wvpc_ag9NJobg?e=kzf4eN)\n\nLicense\n=======\nBenchmark: The benchmark is distributed under the Creative Commons, Attribution-NonCommercial-NoDerivatives.  This license includes the benchmark database and its derivatives.  For attribution, please cite this page, and our publications below.  This data is provided free of charge for non-commercial and academic benchmarking and experimentation use.  If you would like to contribute to the benchmark, please contact us.  If you believe you intended usage may be restricted by the license, please contact us and we can discuss the possibilities.\n\nIJaDataset: We distribute here IJaDataset 2.0 with additions and modifications for the benchmark.  The files contained within were crawled from open-source projects.  Their in-file licenses are maintained as-is.  Additionally, the benchmark database lists the source of each file, and their detected licensing.  IJaDataset 2.0 is from the SECold Project: http://www.secold.org/projects/seclone.\n\nPublications\n============\n\n[1] Jeffrey Svajlenko, Judith F. Islam, Iman Keivanloo, Chanchal K. Roy and Mohammad Mamun Mia, \"Towards a Big Data Curated Benchmark of Inter-Project Code Clones\", In Proceedings of the Early Research Achievements track of the 30th International Conference on Software Maintenance and Evolution (ICSME 2014), 5 pp., Victoria, Canada,  September 2014.\n\n[2] Jeffrey Svajlenko and Chanchal K. Roy, “Evaluating Clone Detection Tools with BigCloneBench”, In Proceedings of the 31st International Conference on Software Maintenance and Evolution (ICSME 2015), 10 pp., Bremen, Germany, September 2015.\n\n[3] Jeffrey Svjalenko and Chanchal K. Roy, \"BigCloneEval: A Clone Detection Tool Evaluation Framework with BigCloneBench\", In Proceedigns of the 32nd International Conference on Software Maintence and Evolution (ICSME 2016), to appear.\n\nContact\n=======\nBenchmark Maintainer: Jeffrey Svajlenko: jeff.svajlenko@gmail.com\n\nJudith F. Islam: judithfran@gmail.com\n\nIman Keivanloo: iman.keivanloo@queensu.ca\n\nChanchal K. Roy: chanchal.roy@usask.ca\n\n\nAcknowledgements\n================\nThe following people have provided clone oracling efforts (in no particular order):\n- Judith F. Islam\n- Mohammad Mamun Mia\n- Graeme Daly\n- Jeffrey Svajlenko\n- Chanchal Roy\n- Muhammad Asaduzzamn\n- Shamima Yeasmin\n- Manishankar Mondal\n- Mike Hoffert\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclonebench%2FBigCloneBench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclonebench%2FBigCloneBench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclonebench%2FBigCloneBench/lists"}