Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/gagan1510/greendeck-proxygrabber
A python library for scraping/checking/fetching/storing proxies. 🎭
https://github.com/gagan1510/greendeck-proxygrabber
greendeck http-proxy https-proxies mongodb proxies proxies-generator proxies-http proxies-scraper proxy proxy-server proxy-service proxychecker proxyservice
Last synced: 9 days ago
JSON representation
A python library for scraping/checking/fetching/storing proxies. 🎭
- Host: GitHub
- URL: https://github.com/gagan1510/greendeck-proxygrabber
- Owner: gagan1510
- License: mit
- Created: 2019-08-10T14:26:21.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-01-26T06:09:04.000Z (almost 4 years ago)
- Last Synced: 2024-09-18T06:31:12.199Z (about 2 months ago)
- Topics: greendeck, http-proxy, https-proxies, mongodb, proxies, proxies-generator, proxies-http, proxies-scraper, proxy, proxy-server, proxy-service, proxychecker, proxyservice
- Language: Python
- Homepage:
- Size: 79.1 KB
- Stars: 20
- Watchers: 3
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
greendeck-proxygrabber 🎭
---
![Gd Logo](https://www.greendeck.co/images/logo/logo_full.png "Greenddeck")*This package is developed by [Greendeck](https://www.greendeck.co/)*
### Install from pip
https://pypi.org/project/greendeck-proxygrabber/```pip install greendeck-proxygrabber```
---
**WHATS NEW?**Added proxy grabbing support of 4 new regions to proxy service, proxy grabber and proxy scraper.
---
### 👉 What is proxy service?
Proxy service is a service that keeps and updates a Mongo Database with latest up and running proxies.
### 👉 How to use?
##### import the service class
```python
from greendeck_proxygrabber import ProxyService
service = ProxyService(MONGO_URI = 'mongodb://127.0.0.1:27017',
update_time = 300,
pool_limit = 1000,
update_count = 200,
database_name = 'proxy_pool',
collection_name_http = 'http',
collection_name_https = 'https',
country_code = 'ALL'
)
```This creates a service object.
##### Args
* update_time = Time after which proxies will be updated (in seconds)
* pool_limit = Limit after which insertion will change to updating
* update_count = Number of proxies to request grabber at a time
* database_name = Mongo Database name to store proxies in
* collection_name_http = Collection name to store http proxies in
* collection_name_https = Collection name to store https proxies in
* country_code = ISO code of one of regions supportedList of supported regions is:
* Combined Regions: ALL
* United States: US
* Germany: DE
* Great Britain: GB
* France: FR
* Czech Republic: CZ
* Netherlands: NL
* India: IN#### Starting the service
```python
service.start()
```Starting service gives the following output:
```Starting proxy service with the following configuration
MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Service...
```This will run forever and will push/update proxies in mongodb after every {```update_time```} seconds.
### 👉 What is proxy to mongo?
Proxy to mongo is a functionality that lets you grab a set of valid proxies from the Internet and store it to the desired MongoDB database. You can schedule this to update or insert a given set of proxies to your database of pool, i.e. put it on airflow or any task scheduler.
### 👉 How to use?
##### import the ProxyToMongo class
```python
from greendeck_proxygrabber import ProxyService
service = ProxyToMongo( MONGO_URI = MONGO_URI,
pool_limit = 1000,
length_proxy = 200,
database_name='proxy_pool',
collection_name_http='http',
collection_name_https='https',
country_code='DE'
)
```This creates a service object.
##### Args
* pool_limit = Total number of proxies to keep in mongo/pass None if you don't want to update
* length_proxy = Number of proxies to fetch at once
* database_name = Mongo Database name to store proxies in
* collection_name_http = Collection name to store http proxies in
* collection_name_https = Collection name to store https proxies in
* country_code = ISO code of one of regions supportedList of supported regions is:
* Combined Regions: ALL
* United States: US
* Germany: DE
* Great Britain: GB
* France: FR
* Czech Republic: CZ
* Netherlands: NL
* India: IN#### Calling the ProxyToMongo grabber
```python
service.get_quick_proxy()
```Starting Grabber gives the following output:
```Gathering proxies with the following configuration:
MONGO_URI: mongodb://127.0.0.1:27017
Database: proxy_pool
Collection names: http, https
Press Ctrl+C once to stop...
Running Proxy Grabber...
```This will run forever and will push/update proxies in mongodb after every {```update_time```} seconds.
### 👉 How to use Proxy Grabber Class?
##### import ```ProxyGrabber``` class
```python
from greendeck_proxygrabber import ProxyGrabber
```##### initialize ```ProxyGrabber``` object
```python
grabber = ProxyGrabber(len_proxy_list, country_code, timeout)
```
Here default values of some arguments are,
```
len_proxy_list = 10
country_code = 'ALL'
timeout = 2
```
Currently the program only supports proxies of combined regions##### Getting checked, running proxies
The grab_proxy ```grab_proxy()``` function helps to fetch the proxies.
```python
grabber.grab_proxy()
```
This returns a dictionary of the following structure:
```python
{
'https': [< list of https proxies >],
'http': [< list of http proxies >],
'region': 'ALL' # default for now
}
```
##### Getting an unchecked list of proxies
The grab_proxy ```proxy_scraper()``` method of ```ScrapeProxy``` helps to fetch the proxies.
This returns a list of 200 proxies of both type http and https.
```python
from greendeck_proxygrabber import ScrapeProxy
proxies_http, proxies_https = ScrapeProxy.proxy_scraper()
```
This returns list of proxies of type http proxies followed by https proxies.
```
http_proxies = [< list of http proxies >]
https_proxies = [< list of https proxies >]
```
##### Filtering invalid proxies from a list of proxies
The ```proxy_checker_https``` and ```proxy_checker_http``` methods from ```ProxyChecker``` class helps to validate the proxies.Given a list of proxies, it checks each of them to be valid or not, and returns a list of valid proxies from the proxies feeded to it.
```python
from greendeck_proxygrabber import ProxyChecker
valid_proxies_http = ProxyChecker.proxy_checker_http(proxy_list = proxy_list_http, timeout = 2)
valid_proxies_https = ProxyChecker.proxy_checker_https(proxy_list = proxy_list_https, timeout = 2)
```---
👉 How to build your own pip package* open an account here https://pypi.org/
In the parent directory
* ```python setup.py sdist bdist_wheel```
* ```twine upload dist/*```references
* https://medium.com/small-things-about-python/lets-talk-about-python-packaging-6d84b81f1bb5MADE WITH 🐍 BY Gagan