https://github.com/ivan-sincek/file-scraper
Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.
https://github.com/ivan-sincek/file-scraper
bug-bounty desktop-penetration-testing ethical-hacking incident-response malware-analysis mobile-penetration-testing offensive-security penetration-testing python rabin2 radare2 red-team-engagement scraping secrets-finder secrets-management security sensitive-data sensitive-files strings web-penetration-testing
Last synced: 10 months ago
JSON representation
Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.
- Host: GitHub
- URL: https://github.com/ivan-sincek/file-scraper
- Owner: ivan-sincek
- License: mit
- Created: 2023-04-01T22:22:03.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-17T09:21:12.000Z (11 months ago)
- Last Synced: 2025-03-28T20:51:26.749Z (11 months ago)
- Topics: bug-bounty, desktop-penetration-testing, ethical-hacking, incident-response, malware-analysis, mobile-penetration-testing, offensive-security, penetration-testing, python, rabin2, radare2, red-team-engagement, scraping, secrets-finder, secrets-management, security, sensitive-data, sensitive-files, strings, web-penetration-testing
- Language: Python
- Homepage:
- Size: 907 KB
- Stars: 11
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# File Scraper
Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.
This tool is only as good as your [RegEx](https://github.com/ivan-sincek/file-scraper?tab=readme-ov-file#build-the-template--run) skills.
You can also style your own [report](https://github.com/ivan-sincek/file-scraper/blob/main/src/file_scraper/reports/default.html).
Tested on Kali Linux v2024.2 (64-bit).
Made for educational purposes. I hope it will help!
## Table of Contents
* [How to Install](#how-to-install)
* [Install Radare2](#install-radare2)
* [Standard Install](#standard-install)
* [Build and Install From the Source](#build-and-install-from-the-source)
* [Build the Template & Run](#build-the-template--run)
* [Usage](#usage)
* [Images](#images)
## How to Install
### Install Radare2
On Kali Linux, run:
```bash
apt-get -y install radare2
```
---
On Windows OS, download and unpack [radareorg/radare2](https://github.com/radareorg/radare2/releases), then, add the `bin` directory to Windows `PATH` environment variable.
---
On macOS, run:
```bash
brew install radare2
```
### Standard Install
```bash
pip3 install --upgrade file-scraper
```
### Build and Install From the Source
```bash
git clone https://github.com/ivan-sincek/file-scraper && cd file-scraper
python3 -m pip install --upgrade build
python3 -m build
python3 -m pip install dist/file_scraper-4.6-py3-none-any.whl
```
## Build the Template & Run
Prepare a template such as [the default template](https://github.com/ivan-sincek/file-scraper/blob/main/src/file_scraper/templates/default.json):
```json
{
"Auth.":{
"query":"(?:basic|bearer)\\ ",
"ignorecase":true,
"search":true
},
"Variables":{
"query":"(?:access|account|admin|auth|card|conf|cookie|cred|customer|email|history|ident|info|jwt|key|kyc|log|otp|pass|pin|priv|refresh|salt|secret|seed|session|setting|sign|token|transaction|transfer|user)[\\w\\d\\-\\_]*(?:\\\"\\ *\\:|\\ *\\=[^\\=]{1})",
"ignorecase":true,
"search":true
},
"Comments":{
"query":"(?:(? = decoded | files | test.exe | etc.
TEMPLATE
File containing extraction details or a single RegEx to use
Default: built-in JSON template file
-t, --template = template.json | "secret\: [\w\d]+" | etc.
EXCLUDES
Exclude all files ending with the specified extension
Specify 'default' to load the built-in list
Use comma-separated values
-e, --excludes = mp3 | default,jpg,png | etc.
INCLUDES
Include all files ending with the specified extension
Overrides the excludes
Use comma-separated values
-i, --includes = java | json,xml,yaml | etc.
BEAUTIFY
Beautify [minified] JavaScript (.js) files
-b, --beautify
THREADS
Number of parallel threads to run
Default: 30
-th, --threads = 10 | etc.
OUT
Output file
-o, --out = results.html | etc.
DEBUG
Enable debug output
-dbg, --debug
```
## Images

Figure 1 - Interactive Report (1)

Figure 2 - Interactive Report (2)

Figure 3 - Interactive Report (3)