https://github.com/tushortz/simple-scrapper

scrapper codes written in python
https://github.com/tushortz/simple-scrapper

Last synced: 11 months ago
JSON representation

scrapper codes written in python

Host: GitHub
URL: https://github.com/tushortz/simple-scrapper
Owner: tushortz
License: mit
Created: 2018-04-11T14:26:31.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2018-04-16T12:14:07.000Z (over 7 years ago)
Last Synced: 2025-01-09T05:18:24.735Z (12 months ago)
Language: Python
Size: 22.5 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# simple scrapper
scrapper code written in python3. It searches most of the domain path for a match and outputs the result in a file.

> just run the code in the generic folder. Alter the options in the `config.json` file as desired.

options are:

* domain -> website url for code to search for data
* path_regex -> the path to search in. Program skips looking for data in url if the path after the `domain` name cannot be found
* keyword_regex -> if match is found in page content, the match will be written to the `output_filename`. Don't add the `(` and `)` so it can actually match exact regex
* use_proxy -> boolean to determine if program needs to use generated proxy
* login -> login credentials of `username` and `password` separates by a colon
* output_filename -> name of the file where match results should be stored.

## Sample config.json

```json
{
"domain": "https://www.example.com",
"path_regex": ".*",
"keyword_regex": ".*?@gmail.com",
"use_proxy": false,
"login": "username:password",
"output_filename": "result.txt"
}
```

> The program may fail after a while due to `maximum recursion depth exceeded` error. If this is the case, just rerun the code and the program will resume execution without overriding the previous `output_filename` content.

## To be implemented

[] use proxy

## contributing
To contribute, simply fork this repository and create a pull request

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tushortz/simple-scrapper

Awesome Lists containing this project

README