Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gill-singh-a/crawler

A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found
https://github.com/gill-singh-a/crawler

crawler multithreading osint python python3 requests scraper

Last synced: 7 days ago
JSON representation

A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found

Awesome Lists containing this project

README

        

# Crawler
A Program that crawls on web starting from a given web page and looking for keywords through other internal links that are found.

## Requirements
Langauge Used = Python3

Modules/Packages used:
* requests
* pickle
* bs4
* datetime
* optparse
* colorama
* time

Install the dependencies:
```bash
pip install -r requirements.txt
```
## Input
* '-u', "--url" : URL to start Crawling from
* '-t', "--in-text" : Words to find in text (seperated by ',')
* '-s', "--session-id" : Session ID (Cookie) for the Request Header (Optional)
* '-w', "--write" : Name of the File for the data to be dumped (default=current data and time)
* '-e', "--external" : Crawl on External URLs (True/False, default=False)
* '-T', "--timeout" : Request Timeout
## Output
It will stop when it has crawled all the internal links of the given URL or if the user presses CTRL+C.

It then display Information about total URLs extracted, Internal URLs extracted and external URLs extracted.

And finally gives a list or URLs in which the keywords we've interested in were found.