https://github.com/zebbern/regex-crawler
Regex Web Crawler that searches on custom regexes meanwhile crawling each site to find the information your looking for!
https://github.com/zebbern/regex-crawler
bug-bounty bugbounty crawler information-gathering information-retrieval osint osint-tool pentest python regex regex-engine regex-match regex-pattern regex-tool toolkit tools website
Last synced: 2 months ago
JSON representation
Regex Web Crawler that searches on custom regexes meanwhile crawling each site to find the information your looking for!
- Host: GitHub
- URL: https://github.com/zebbern/regex-crawler
- Owner: zebbern
- License: mit
- Created: 2025-02-12T09:02:25.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-02-26T20:13:06.000Z (4 months ago)
- Last Synced: 2025-04-13T16:51:09.347Z (2 months ago)
- Topics: bug-bounty, bugbounty, crawler, information-gathering, information-retrieval, osint, osint-tool, pentest, python, regex, regex-engine, regex-match, regex-pattern, regex-tool, toolkit, tools, website
- Language: Python
- Homepage:
- Size: 39.1 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Regex Web Crawler


**An advanced web crawler built for bug bounty hunters!**
**Tool recursively crawls a target website, performs regex-based content searches, and saves results in structured YAML files.**
**Includes optional security analysis for reconnaissance.**---
### `Features:`
Validate URLs before crawling to prevent errors.Extract all internal links recursively up to a specified depth.
Perform regex-based searches on each page's content using a user-defined regex list.
Optionally enable advanced security checks such as scanning HTTP headers and HTML comments for potential leaks.
Store all crawled URLs and results in structured YAML format for easy analysis.
---
How To Run
**Step 1: Configure the `config.yaml` file to set up the target URL and crawling options.**
**Step 2: Run the Python script and let it crawl the target website while extracting valuable information.**
**Step 3: Review the structured results saved in `results.yaml`.**## Requirements:
```
requests
beautifulsoup4
pyyaml
```
Install the required dependencies with:
```
pip install -r requirements.txt
```## Usage:
1. Set up your configuration in `config.yaml`:
```yaml
base_url: "https://example.com"
crawl_depth: 1
advanced: true
regex_file: "regex_patterns.txt"
output_file: "results.yaml"
```
2. Create or edit your regex patterns in `regex_patterns.txt` (one per line):
```txt
(?i)password\s*[:=]\s*['"][^'"]+['"]
(?i)secret\s*[:=]\s*['"][^'"]+['"]
```
3. Run the script:
```bash
python para.py
```## Contribute:
Feel free to suggest improvements or contribute by visiting [https://github.com/zebbern/regex-crawler](https://github.com/zebbern/regex-crawler).
> [!WARNING]
> This tool is intended for ethical hacking and bug bounty purposes only. Unauthorized scanning of third-party websites is illegal and unethical. Always obtain explicit permission before testing any target.