https://github.com/tower1229/crawler
Nodejs crawler for cnbeta.com
https://github.com/tower1229/crawler
crawler nodejs
Last synced: 9 months ago
JSON representation
Nodejs crawler for cnbeta.com
- Host: GitHub
- URL: https://github.com/tower1229/crawler
- Owner: tower1229
- License: gpl-3.0
- Created: 2017-06-09T05:50:30.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-07-15T06:03:24.000Z (almost 9 years ago)
- Last Synced: 2025-04-13T10:01:24.890Z (about 1 year ago)
- Topics: crawler, nodejs
- Language: JavaScript
- Size: 37.1 KB
- Stars: 19
- Watchers: 2
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# crawler
Nodejs crawler for [cnbeta.com](http://www.cnbeta.com/), The source code is on [Github](https://github.com/tower1229/crawler).
- 用于爬取并保存cnbeta新闻内容及图片
- 从起始文章开始爬取,异步获取上一篇文章ID并循环爬取
- 支持爬取总条数限制,默认50条
- 支持301跳转追踪
- 仅用于Nodejs学习,无意冒犯
## 使用
- 安装依赖:`npm install`
- 修改app.js中的`startId`变量为起始文章ID
- 运行抓取:`node app [limitNumber=50]`
## 示例
- 例如从该篇文章开始爬取`http://www.cnbeta.com/articles/tech/620719.htm`,修改 `startId="620719"`;
- 执行爬取10条:`node app 10`

## 更多
> [前端路上](http://refined-x.com)