{"id":13475580,"url":"https://github.com/simsong/bulk_extractor","last_synced_at":"2025-05-15T09:03:53.688Z","repository":{"id":2906123,"uuid":"3914697","full_name":"simsong/bulk_extractor","owner":"simsong","description":"This is the development tree. Production downloads are at:","archived":false,"fork":false,"pushed_at":"2025-03-26T01:55:26.000Z","size":75087,"stargazers_count":1208,"open_issues_count":134,"forks_count":204,"subscribers_count":76,"default_branch":"main","last_synced_at":"2025-05-15T09:02:52.824Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/simsong/bulk_extractor/releases","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/simsong.png","metadata":{"files":{"readme":"README.md","changelog":"ChangeLog","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2012-04-03T04:36:41.000Z","updated_at":"2025-05-14T20:15:38.000Z","dependencies_parsed_at":"2025-04-14T16:57:12.344Z","dependency_job_id":null,"html_url":"https://github.com/simsong/bulk_extractor","commit_stats":{"total_commits":1868,"total_committers":45,"mean_commits":41.51111111111111,"dds":0.332441113490364,"last_synced_commit":"452f8c728107b3b9e67b46cbe1faeba6259977a0"},"previous_names":[],"tags_count":23,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simsong%2Fbulk_extractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simsong%2Fbulk_extractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simsong%2Fbulk_extractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/simsong%2Fbulk_extractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/simsong","download_url":"https://codeload.github.com/simsong/bulk_extractor/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254310513,"owners_count":22049468,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T16:01:21.657Z","updated_at":"2025-05-15T09:03:48.679Z","avatar_url":"https://github.com/simsong.png","language":"C++","readme":"[![codecov](https://codecov.io/gh/simsong/bulk_extractor/branch/main/graph/badge.svg?token=3w691sdgLu)](https://codecov.io/gh/simsong/bulk_extractor)\n\u003ca href=\"https://scan.coverity.com/projects/simsong-bulk_extractor\"\u003e\n  \u003cimg alt=\"Coverity Scan Build Status\" src=\"https://scan.coverity.com/projects/29726/badge.svg\"/\u003e\u003c/a\u003e\n\n`bulk_extractor` is a high-performance digital forensics exploitation\ntool.  It is a \"get evidence\" button that rapidly scans any kind of\ninput (disk images, files, directories of files, etc) and extracts\nstructured information such as email addresses, credit card numbers,\nJPEGs and JSON snippets without parsing the file system or file system\nstructures. The results are stored in text files that are easily\ninspected, searched, or used as inputs for other forensic\nprocessing. bulk_extractor also creates histograms of certain kinds of\nfeatures that it finds, such as Google search terms and email\naddresses, as previous research has shown that such histograms are\nespecially useful in investigative and law enforcement applications.\n\nUnlike other digital forensics tools, `bulk_extractor` probes every byte of data to see if it is the start of a\nsequence that can be decompressed or otherwise decoded. If so, the\ndecoded data are recursively re-examined. As a result, `bulk_extractor` can find things like BASE64-encoded JPEGs and\ncompressed JSON objects that traditional carving tools miss.\n\nThis is the `bulk_extractor` 2.1 development branch! It is reliable, but if you want to have a well-tested production quality release, download a release from https://github.com/simsong/bulk_extractor/releases.\n\nBuilding `bulk_extractor`\n=========================\nWe recommend building from sources. We provide a number of `bash` scripts in the `etc/` directory that will configure a clean virtual machine:\n\n```\ngit clone --recurse-submodules https://github.com/simsong/bulk_extractor.git\n./bootstrap.sh\n./configure\nmake\nmake install\n```\n\nFor detailed instructions on installing packages and building bulk_extractor, read the wiki page here:\nhttps://github.com/simsong/bulk_extractor/wiki/Installing-bulk_extractor\n\nFor more information on bulk_extractor, visit: https://forensics.wiki/bulk_extractor\n\nTested Configurations\n=====================\nThis release of bulk_extractor requires C++17 and has been tested to compile on the following platforms:\n\n* Amazon Linux as of 2023-05-25\n* Fedora 36 (most recently)\n* Ubuntu 20.04LTS\n* MacOS 13.2.1\n\nYou should *always* start with a fresh VM and prepare the system using the appropriate prep script in the `etc/` directory.\n\nTested Configurations Which bulk_extractor Does Not Work\n========================================================\n* Debian 10 (is not supported for native builds))\n\nRECOMMENDED CITATION\n====================\nIf you are writing a scientific paper and using bulk_extractor, please cite it with:\n\nGarfinkel, Simson, Digital media triage with bulk data analysis and bulk_extractor. Computers and Security 32: 56-72 (2013)\n* [Science Direct](https://www.sciencedirect.com/science/article/pii/S0167404812001472)\n* [Bibliometrics](https://plu.mx/plum/a/?doi=10.1016/j.cose.2012.09.011\u0026theme=plum-sciencedirect-theme\u0026hideUsage=true)\n* [Author's website](https://simson.net/clips/academic/2013.COSE.bulk_extractor.pdf)\n```\n@article{10.5555/2748150.2748581,\nauthor = {Garfinkel, Simson L.},\ntitle = {Digital Media Triage with Bulk Data Analysis and Bulk_extractor},\nyear = {2013},\nissue_date = {February 2013},\npublisher = {Elsevier Advanced Technology Publications},\naddress = {GBR},\nvolume = {32},\nnumber = {C},\nissn = {0167-4048},\njournal = {Comput. Secur.},\nmonth = feb,\npages = {56–72},\nnumpages = {17},\nkeywords = {Digital forensics, Bulk data analysis, bulk_extractor, Stream-based forensics, Windows hibernation files, Parallelized forensic analysis, Optimistic decompression, Forensic path, Margin, EnCase}\n}\n```\n\nENVIRONMENT VARIABLES\n=====================\nThe following environment variables can be set to change the operation of `bulk_extractor`:\n\n|Variable|Behavior|\n|--------|--------|\n|`DEBUG_BENCHMARK_CPU`|Include CPU benchmark information in `report.xml` file|\n|`DEBUG_NO_SCANNER_BYPASS`|Disables scanner bypass logic that bypasses some scanners if an sbuf contains ngrams or does not have a high distinct character count.|\n|`DEBUG_HISTOGRAMS`|Print debugging information on file-based histograms.|\n|`DEBUG_HISTOGRAMS_NO_INCREMENTAL`|Do not use incremental, memory-based histograms.|\n|`DEBUG_PRINT_STEPS`|Prints to stdout when each scanner is called for each sbuf|\n|`DEBUG_DUMP_DATA`|Hex-dump each sbuf that is to be scanned.|\n|`DEBUG_SCANNERS_IGNORE`|A comma-separated list of scanners to ignore (not load). Useful for debugging unit tests.|\n\nOther hints for debugging:\n\n* Run -xall to run without any scanners.\n* Run with a random sampling of 0.001% to debug reading image size and a few quick seeks.\n\nBUILDING ON WINDOWS\n===================\nNote: Currenlty bulk_extractor 2.1 does not build on windows, but 2.0 does.\n\nIf you wish to build for Windows, you should cross-compile from a Fedora system. Start with a clean VM and use these commands:\n\n```\n$ git clone --recurse-submodules https://github.com/simsong/bulk_extractor.git\n$ cd bulk_extractor/etc\n$ bash CONFIGURE_FEDORA36_win64.bash\n$ cd ..\n$ make win64\n```\n\n\n\n\nBULK_EXTRACTOR 2.0 RELEASE NOTES\n================================\n\n## Release 2.1.1 (April 26, 2024)\nRenamed jpeg_carved feature recorder to jpeg, so that the jpeg carve mode can be set with -S jpeg_carve_mode=2, rather than -S jpeg_carved_carve_mode=2, which was confusing.\n\n\n## Release 2.0\n`bulk_extractor` 2.0 (BE2) is now operational. Although it works with the Java-based viewer, we do not currently have an installer that runs under Windows.\n\nBE2  requires C++17 to compile. It requires `https://github.com/simsong/be13_api.git` as a sub-module, which in turn requires `dfxml` as a sub-module.\n\nThe project took longer than anticipated. In addition to updating to C++17, It was used as an opportunity for massive code refactoring and general increase in code quality, testability and reliability. An article about the experiment will appear in a forthcoming issue of [ACM Queue](https://queue.acm.org/)\n","funding_links":[],"categories":["File Carving","Uncategorized","IR Tools Collection","\u003ca id=\"9eee96404f868f372a6cbc6769ccb7f8\"\u003e\u003c/a\u003e新添加的","C++","Challenges","Tools","Other Lists","\u003ca id=\"9eee96404f868f372a6cbc6769ccb7f8\"\u003e\u003c/a\u003e工具","Forensics","\u003ca id=\"0d23b542d7b0b1069a91f6c500009c3a\"\u003e\u003c/a\u003ebulk_extractor","Digital Forensics \u0026 Ingest","IR tools Collection"],"sub_categories":["Other Resources","Uncategorized","Evidence Collection","\u003ca id=\"31185b925d5152c7469b963809ceb22d\"\u003e\u003c/a\u003e新添加的","Carving","Analysis / Gathering tool (Know your ennemies)","🛡️ DFIR:","Mobile Penetration Testing","Public Data API"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimsong%2Fbulk_extractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimsong%2Fbulk_extractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimsong%2Fbulk_extractor/lists"}