Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/leeyis/ip_proxy_pool
Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.
https://github.com/leeyis/ip_proxy_pool
dynamic proxy proxy-ip scrapy spider
Last synced: 3 months ago
JSON representation
Generating spiders dynamically to crawl and check those free proxy ip on the internet with scrapy.
- Host: GitHub
- URL: https://github.com/leeyis/ip_proxy_pool
- Owner: leeyis
- Created: 2016-12-05T06:43:20.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-10-06T13:23:26.000Z (about 6 years ago)
- Last Synced: 2024-04-24T12:19:36.744Z (7 months ago)
- Topics: dynamic, proxy, proxy-ip, scrapy, spider
- Language: Python
- Homepage: http://jinbitou.net/crawler
- Size: 41 KB
- Stars: 42
- Watchers: 4
- Forks: 18
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-network-stuff - **38**星
README
# ip_proxy_pool
A dynamic configurable proxy IP crawler based on Scrapy. It makes it easy to crawl hundreds of thousands of proxy IPs in a short time. By maintaining a spider code and a few groups of website data extraction rules you can easily grab lots of proxy IPs of these sites. See the [blogs](http://jinbitou.net/2016/12/05/2244.html) for more detail.
## Main Requirements
For more details see requirements.txt
- Scrapy 1.2.1
- MySQL-python 1.2.5
- Redis 2.10.5
- SQLAlchemy 1.1.4## Install in development
**CentOS**
```bash
$ sudo yum install python-devel
$ sudo yum install gcc libffi-devel openssl-devel
$ pip install scrapy
$ pip install SQLAlchemy
$ pip install redis
```