Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Jiramew/spoon
π₯ A package for building specific Proxy Pool for different Sites.
https://github.com/Jiramew/spoon
crawler distributed ip proxies proxy proxy-provider proxypool python redis spider spoon
Last synced: 3 months ago
JSON representation
π₯ A package for building specific Proxy Pool for different Sites.
- Host: GitHub
- URL: https://github.com/Jiramew/spoon
- Owner: Jiramew
- License: gpl-3.0
- Created: 2017-07-21T01:30:26.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2023-05-22T21:33:39.000Z (over 1 year ago)
- Last Synced: 2024-07-31T14:48:53.464Z (3 months ago)
- Topics: crawler, distributed, ip, proxies, proxy, proxy-provider, proxypool, python, redis, spider, spoon
- Language: Python
- Homepage:
- Size: 85 KB
- Stars: 173
- Watchers: 8
- Forks: 23
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- starred-awesome - spoon - A package for building specific Proxy Pool for different Sites. (Python)
README
# Spoon - A package for building specific Proxy Pool for different Sites.
Spoon is a library for building Distributed Proxy Pool for each different sites as you assign.
Only running on python 3.## Install
Simply run: `pip install spoonproxy` or clone the repo and set it into your PYTHONPATH.
## Run### Spoon-server
Please make sure the Redis is running. Default configuration is "host:localhost, port:6379". You can also modify the Redis connection.
Like `example.py` in `spoon_server/example`,
You can assign many different proxy providers.
```python
from spoon_server.proxy.fetcher import Fetcher
from spoon_server.main.proxy_pipe import ProxyPipe
from spoon_server.proxy.kuai_provider import KuaiProvider
from spoon_server.proxy.xici_provider import XiciProvider
from spoon_server.database.redis_config import RedisConfig
from spoon_server.main.checker import CheckerBaidudef main_run():
redis = RedisConfig("127.0.0.1", 21009)
p1 = ProxyPipe(url_prefix="https://www.baidu.com",
fetcher=Fetcher(use_default=False),
database=redis,
checker=CheckerBaidu()).set_fetcher([KuaiProvider()]).add_fetcher([XiciProvider()])
p1.start()if __name__ == '__main__':
main_run()
```Also, with different checker, you can validate the result precisely.
```python
class CheckerBaidu(Checker):
def checker_func(self, html=None):
if isinstance(html, bytes):
html = html.decode('utf-8')
if re.search(r".*ηΎεΊ¦δΈδΈοΌδ½ ε°±η₯ι.*", html):
return True
else:
return False
```Also, as the code shows in `spoon_server/example/example_multi.py`, by using multiprocess, you can get many queues to fetching & validating the proxies.
You can also assign different Providers for different url.
The default proxy providers are shown below, you can write your own providers.
name
description
WebProvider
Get proxy from http api
FileProvider
Get proxy from file
GouProvider
http://www.goubanjia.com
KuaiProvider
http://www.kuaidaili.com
SixProvider
http://m.66ip.cn
UsProvider
https://www.us-proxy.org
WuyouProvider
http://www.data5u.com
XiciProvider
http://www.xicidaili.com
IP181Provider
http://www.ip181.com
XunProvider
http://www.xdaili.cn
PlpProvider
https://list.proxylistplus.com
IP3366Provider
http://www.ip3366.net
BusyProvider
https://proxy.coderbusy.com
NianProvider
http://www.nianshao.me
PdbProvider
http://proxydb.net
ZdayeProvider
http://ip.zdaye.com
YaoProvider
http://www.httpsdaili.com/
FeilongProvider
http://www.feilongip.com/
IP31Provider
https://31f.cn/http-proxy/
XiaohexiaProvider
http://www.xiaohexia.cn/
CoolProvider
https://www.cool-proxy.net/
NNtimeProvider
http://nntime.com/
ListendeProvider
https://www.proxy-listen.de/
IhuanProvider
https://ip.ihuan.me/
IphaiProvider
http://www.iphai.com/
MimvpProvider(@NeedCaptcha)
https://proxy.mimvp.com/
GPProvider(@NeedProxy if you're in China)
http://www.gatherproxy.com
FPLProvider(@NeedProxy if you're in China)
https://free-proxy-list.net
SSLProvider(@NeedProxy if you're in China)
https://www.sslproxies.org
NordProvider(@NeedProxy if you're in China)
https://nordvpn.com
PremProvider(@NeedProxy if you're in China)
https://premproxy.com
YouProvider(@Deprecated)
http://www.youdaili.net
### Spoon-web
A Simple django web api demo. You could use any web server and write your own api.
Gently run `python manager.py runserver **.**.**.**:*****`
The simple apis include:
name
description
http://127.0.0.1:21010/api/v1/get_keys
Get all keys from redis
http://127.0.0.1:21010/api/v1/fetchone_from?target=www.google.com&filter=65
Get one useful proxy.
target: the specific url
filter: successful-revalidate times
http://127.0.0.1:21010/api/v1/fetchall_from?target=www.google.com&filter=65
Get all useful proxies.
http://127.0.0.1:21010/api/v1/fetch_hundred_recent?target=www.baidu.com&filter=5
Get recently joined full-scored proxies.
target: the specific url
filter: time in seconds
http://127.0.0.1:21010/api/v1/fetch_stale?num=100
Get recently proxies without check.
num: the specific number of proxies you want
http://127.0.0.1:21010/api/v1/fetch_recent?target=www.baidu.com
Get recently proxies that successfully validated.
target: the specific url