Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/AhmedConstant/BlindCrawler
A tool for web crawling & content discovery
https://github.com/AhmedConstant/BlindCrawler
Last synced: 2 months ago
JSON representation
A tool for web crawling & content discovery
- Host: GitHub
- URL: https://github.com/AhmedConstant/BlindCrawler
- Owner: AhmedConstant
- License: mit
- Created: 2020-09-22T12:28:18.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-09-22T14:26:21.000Z (over 4 years ago)
- Last Synced: 2024-08-05T17:33:51.330Z (6 months ago)
- Language: Python
- Size: 33.2 KB
- Stars: 9
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-hacking-lists - AhmedConstant/BlindCrawler - A tool for web crawling & content discovery (Python)
README
# BlindCrawler - Beta v1.0
![alt text](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-logo-github.png "BlindCrawler")
A tool for web crawling & content discovery.
# Installation
`git clone https://github.com/AhmedConstant/BlindCrawler.git``cd /BlindCrawler`
`sudo pip3 install requirements.txt`
# Usage
![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-usges-github.png)
### domain
`python3 BlindCrawler.py -s https://domain.com`
### subdomain
`python3 BlindCrawler.py -s https://sub.domain.com/path`
### random agents
`python3 BlindCrawler.py -s https://sub.domain.com/path --random-agents`
### with cookies
`python3 BlindCrawler.py -s https://sub.domain.com/path -c "key: value; key:value"`
# Features
![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-output.png)
* Process
* Crawle the subdomains to expand the discovery surface.
* Crawle /robot.txt for more URLs to crawle.
* Crawle /sitemap.xml for more URLs to crawle.
* Use web archive CDX API to get more URLs to crawle.
* Output
![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-runtime.png)
* A file with all **crawled** URLs
* A file with all **paths** crawled
* A file with **subdomains** discovered.
* A file with **schemes** discovered.
* A file with **emails** discovered.
* a file with **comments** discovered
![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-output-dirs.png)
* Performance
* There will be a continuous process **to make performance as fast as possible**
* Design
* **OOP** Design
* Good **Documentation**.
* **Easy to edit** the script code
# To-Do List
- [x] ~~Relase beta version.~~
- [ ] Output in JSON, XML and CSV formats.
- [ ] Bruteforce for the sensitive files and directories.
- [ ] Extract **strings with high entropy** from crawled pages. [UUID, Key..etc]
- [ ] Recognize the **static/repetitive** Urls to avoid crawling it & reduce time and resources.
- [ ] Let the user provide its own **pattern** to extract from crawled pages.
- [ ] Create a **custom wordlist** for directory bruteforcing.
- [ ] Search for potential **DOM XSS** vulnerable functions.
- [ ] **Fuzzing** the GET Parameters.
- [ ] .....
# The Author
Ahmed Constant
[Twitter](https://twitter.com/a_Constant_)