https://github.com/AhmedConstant/BlindCrawler

A tool for web crawling & content discovery
https://github.com/AhmedConstant/BlindCrawler

Last synced: 7 months ago
JSON representation

A tool for web crawling & content discovery

Host: GitHub
URL: https://github.com/AhmedConstant/BlindCrawler
Owner: AhmedConstant
License: mit
Created: 2020-09-22T12:28:18.000Z (over 4 years ago)
Default Branch: master
Last Pushed: 2020-09-22T14:26:21.000Z (over 4 years ago)
Last Synced: 2024-08-05T17:33:51.330Z (11 months ago)
Language: Python
Size: 33.2 KB
Stars: 9
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-hacking-lists - AhmedConstant/BlindCrawler - A tool for web crawling & content discovery (Python)

README

# BlindCrawler - Beta v1.0
![alt text](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-logo-github.png "BlindCrawler")

A tool for web crawling & content discovery.
# Installation
`git clone https://github.com/AhmedConstant/BlindCrawler.git`

`cd /BlindCrawler`

`sudo pip3 install requirements.txt`

# Usage
![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-usges-github.png)
### domain
`python3 BlindCrawler.py -s https://domain.com`

### subdomain
`python3 BlindCrawler.py -s https://sub.domain.com/path`

### random agents
`python3 BlindCrawler.py -s https://sub.domain.com/path --random-agents`

### with cookies
`python3 BlindCrawler.py -s https://sub.domain.com/path -c "key: value; key:value"`

# Features
![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-output.png)
* Process
* Crawle the subdomains to expand the discovery surface.
* Crawle /robot.txt for more URLs to crawle.
* Crawle /sitemap.xml for more URLs to crawle.
* Use web archive CDX API to get more URLs to crawle.
* Output
![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-runtime.png)
* A file with all **crawled** URLs
* A file with all **paths** crawled
* A file with **subdomains** discovered.
* A file with **schemes** discovered.
* A file with **emails** discovered.
* a file with **comments** discovered

![Runtime](https://github.com/AhmedConstant/ImagesV/blob/master/blindcrawler-output-dirs.png)
* Performance
* There will be a continuous process **to make performance as fast as possible**
* Design
* **OOP** Design
* Good **Documentation**.
* **Easy to edit** the script code
# To-Do List
- [x] ~~Relase beta version.~~
- [ ] Output in JSON, XML and CSV formats.
- [ ] Bruteforce for the sensitive files and directories.
- [ ] Extract **strings with high entropy** from crawled pages. [UUID, Key..etc]
- [ ] Recognize the **static/repetitive** Urls to avoid crawling it & reduce time and resources.
- [ ] Let the user provide its own **pattern** to extract from crawled pages.
- [ ] Create a **custom wordlist** for directory bruteforcing.
- [ ] Search for potential **DOM XSS** vulnerable functions.
- [ ] **Fuzzing** the GET Parameters.
- [ ] .....
# The Author
Ahmed Constant
[Twitter](https://twitter.com/a_Constant_)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/AhmedConstant/BlindCrawler

Awesome Lists containing this project

README