https://github.com/m9rco/mangocawler
🐟 基于swoole 的爬虫方案🔥
https://github.com/m9rco/mangocawler
php php7 python swoole
Last synced: 9 months ago
JSON representation
🐟 基于swoole 的爬虫方案🔥
- Host: GitHub
- URL: https://github.com/m9rco/mangocawler
- Owner: m9rco
- License: mit
- Created: 2017-05-02T03:27:30.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-05-04T07:22:09.000Z (about 9 years ago)
- Last Synced: 2025-04-11T21:51:42.659Z (about 1 year ago)
- Topics: php, php7, python, swoole
- Language: PHP
- Size: 300 KB
- Stars: 9
- Watchers: 2
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
MangoCawler - 基于Swoole实现的多进程爬虫方案
===============
> 知行合一,学以致用
## What I Do
- 基于Swoole 多进程的爬虫方案

## Requirement
- [PHP 7 +](http://php.net/manual/zh/migration71.new-features.php)
- [Swoole](https://www.zhihu.com/question/41832866)
- [Composer 1.0](http://pkg.phpcomposer.com/)
## SQL
```
CREATE TABLE `damai_list` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(200) DEFAULT '' COMMENT 'Url',
`province` varchar(20) DEFAULT '' COMMENT '省份',
`city` varchar(20) DEFAULT '' COMMENT '城市',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5915 DEFAULT CHARSET=utf8;
```
## 更改配置文件
```
# \drive\CrawlerInit.php
define('M_CRAWLER_URL', 'https://venue.damai.cn/search.aspx?cityID=0&k=0&keyword=&pageIndex=\d');
define('M_DB_HOST', '127.0.0.1');
define('M_DB_NAME', '');
define('M_DB_USER', '');
define('M_DB_PWD' , '');
```
## 使用方式
```
composer install
// 首先开启服务端启用连接池
php \drive\worker\Server.php
// 开始爬吧
php index.php
```
## 纠错
如果大家发现有什么不对的地方,可以发起一个[issue](https://github.com/PuShaoWei/Mango16/issues)或者[pull request](https://github.com/PuShaoWei/Mango16/pulls),我会及时纠正
> 补充:发起pull request的commit message请参考文章[Commit message 和 Change log 编写指南](http://www.ruanyifeng.com/blog/2016/01/commit_message_change_log.html)