https://github.com/AhmedConstant/BlindCrawler
A tool for web crawling & content discovery
https://github.com/AhmedConstant/BlindCrawler
Last synced: 6 months ago
JSON representation
A tool for web crawling & content discovery
- Host: GitHub
- URL: https://github.com/AhmedConstant/BlindCrawler
- Owner: AhmedConstant
- License: mit
- Created: 2020-09-22T12:28:18.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-09-22T14:26:21.000Z (over 4 years ago)
- Last Synced: 2024-08-05T17:33:51.330Z (9 months ago)
- Language: Python
- Size: 33.2 KB
- Stars: 9
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-hacking-lists - AhmedConstant/BlindCrawler - A tool for web crawling & content discovery (Python)
README
# BlindCrawler - Beta v1.0

A tool for web crawling & content discovery.
# Installation
`git clone https://github.com/AhmedConstant/BlindCrawler.git``cd /BlindCrawler`
`sudo pip3 install requirements.txt`
# Usage

### domain
`python3 BlindCrawler.py -s https://domain.com`
### subdomain
`python3 BlindCrawler.py -s https://sub.domain.com/path`
### random agents
`python3 BlindCrawler.py -s https://sub.domain.com/path --random-agents`
### with cookies
`python3 BlindCrawler.py -s https://sub.domain.com/path -c "key: value; key:value"`
# Features

* Process
* Crawle the subdomains to expand the discovery surface.
* Crawle /robot.txt for more URLs to crawle.
* Crawle /sitemap.xml for more URLs to crawle.
* Use web archive CDX API to get more URLs to crawle.
* Output

* A file with all **crawled** URLs
* A file with all **paths** crawled
* A file with **subdomains** discovered.
* A file with **schemes** discovered.
* A file with **emails** discovered.
* a file with **comments** discovered

* Performance
* There will be a continuous process **to make performance as fast as possible**
* Design
* **OOP** Design
* Good **Documentation**.
* **Easy to edit** the script code
# To-Do List
- [x] ~~Relase beta version.~~
- [ ] Output in JSON, XML and CSV formats.
- [ ] Bruteforce for the sensitive files and directories.
- [ ] Extract **strings with high entropy** from crawled pages. [UUID, Key..etc]
- [ ] Recognize the **static/repetitive** Urls to avoid crawling it & reduce time and resources.
- [ ] Let the user provide its own **pattern** to extract from crawled pages.
- [ ] Create a **custom wordlist** for directory bruteforcing.
- [ ] Search for potential **DOM XSS** vulnerable functions.
- [ ] **Fuzzing** the GET Parameters.
- [ ] .....
# The Author
Ahmed Constant
[Twitter](https://twitter.com/a_Constant_)