Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Jiramew/spoon

πŸ₯„ A package for building specific Proxy Pool for different Sites.
https://github.com/Jiramew/spoon

crawler distributed ip proxies proxy proxy-provider proxypool python redis spider spoon

Last synced: about 1 month ago
JSON representation

πŸ₯„ A package for building specific Proxy Pool for different Sites.

Awesome Lists containing this project

README

        

# Spoon - A package for building specific Proxy Pool for different Sites.
Spoon is a library for building Distributed Proxy Pool for each different sites as you assign.
Only running on python 3.

## Install
Simply run: `pip install spoonproxy` or clone the repo and set it into your PYTHONPATH.

## Run

### Spoon-server
Please make sure the Redis is running. Default configuration is "host:localhost, port:6379". You can also modify the Redis connection.
Like `example.py` in `spoon_server/example`,
You can assign many different proxy providers.
```python
from spoon_server.proxy.fetcher import Fetcher
from spoon_server.main.proxy_pipe import ProxyPipe
from spoon_server.proxy.kuai_provider import KuaiProvider
from spoon_server.proxy.xici_provider import XiciProvider
from spoon_server.database.redis_config import RedisConfig
from spoon_server.main.checker import CheckerBaidu

def main_run():
redis = RedisConfig("127.0.0.1", 21009)
p1 = ProxyPipe(url_prefix="https://www.baidu.com",
fetcher=Fetcher(use_default=False),
database=redis,
checker=CheckerBaidu()).set_fetcher([KuaiProvider()]).add_fetcher([XiciProvider()])
p1.start()

if __name__ == '__main__':
main_run()
```

Also, with different checker, you can validate the result precisely.
```python
class CheckerBaidu(Checker):
def checker_func(self, html=None):
if isinstance(html, bytes):
html = html.decode('utf-8')
if re.search(r".*η™ΎεΊ¦δΈ€δΈ‹οΌŒδ½ ε°±ηŸ₯道.*", html):
return True
else:
return False
```

Also, as the code shows in `spoon_server/example/example_multi.py`, by using multiprocess, you can get many queues to fetching & validating the proxies.
You can also assign different Providers for different url.
The default proxy providers are shown below, you can write your own providers.



name
description




WebProvider
Get proxy from http api


FileProvider
Get proxy from file


GouProvider
http://www.goubanjia.com


KuaiProvider
http://www.kuaidaili.com


SixProvider
http://m.66ip.cn


UsProvider
https://www.us-proxy.org


WuyouProvider
http://www.data5u.com


XiciProvider
http://www.xicidaili.com


IP181Provider
http://www.ip181.com


XunProvider
http://www.xdaili.cn


PlpProvider
https://list.proxylistplus.com


IP3366Provider
http://www.ip3366.net


BusyProvider
https://proxy.coderbusy.com


NianProvider
http://www.nianshao.me


PdbProvider
http://proxydb.net


ZdayeProvider
http://ip.zdaye.com


YaoProvider
http://www.httpsdaili.com/


FeilongProvider
http://www.feilongip.com/


IP31Provider
https://31f.cn/http-proxy/


XiaohexiaProvider
http://www.xiaohexia.cn/


CoolProvider
https://www.cool-proxy.net/


NNtimeProvider
http://nntime.com/


ListendeProvider
https://www.proxy-listen.de/


IhuanProvider
https://ip.ihuan.me/


IphaiProvider
http://www.iphai.com/


MimvpProvider(@NeedCaptcha)
https://proxy.mimvp.com/


GPProvider(@NeedProxy if you're in China)
http://www.gatherproxy.com


FPLProvider(@NeedProxy if you're in China)
https://free-proxy-list.net


SSLProvider(@NeedProxy if you're in China)
https://www.sslproxies.org


NordProvider(@NeedProxy if you're in China)
https://nordvpn.com


PremProvider(@NeedProxy if you're in China)
https://premproxy.com


YouProvider(@Deprecated)
http://www.youdaili.net

### Spoon-web
A Simple django web api demo. You could use any web server and write your own api.
Gently run `python manager.py runserver **.**.**.**:*****`
The simple apis include:



name
description




http://127.0.0.1:21010/api/v1/get_keys
Get all keys from redis


http://127.0.0.1:21010/api/v1/fetchone_from?target=www.google.com&filter=65
Get one useful proxy.
target: the specific url
filter: successful-revalidate times


http://127.0.0.1:21010/api/v1/fetchall_from?target=www.google.com&filter=65
Get all useful proxies.


http://127.0.0.1:21010/api/v1/fetch_hundred_recent?target=www.baidu.com&filter=5
Get recently joined full-scored proxies.
target: the specific url
filter: time in seconds


http://127.0.0.1:21010/api/v1/fetch_stale?num=100
Get recently proxies without check.
num: the specific number of proxies you want


http://127.0.0.1:21010/api/v1/fetch_recent?target=www.baidu.com
Get recently proxies that successfully validated.
target: the specific url