https://github.com/xanke/nscan
NodeJs 网页采集器
https://github.com/xanke/nscan
crawler javascript nodejs
Last synced: about 1 month ago
JSON representation
NodeJs 网页采集器
- Host: GitHub
- URL: https://github.com/xanke/nscan
- Owner: xanke
- Created: 2018-06-03T12:35:57.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2018-06-03T16:56:39.000Z (almost 8 years ago)
- Last Synced: 2025-01-29T19:25:35.425Z (over 1 year ago)
- Topics: crawler, javascript, nodejs
- Language: JavaScript
- Homepage:
- Size: 21.5 KB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NScan
基于 NodeJs 的网页采集器,像 Vue 一样简易配置
## 快速开始
**数据库配置:**
/app/config/default.js
```javascript
module.exports = {
mongodb: 'mongodb://'
}
```
**采集示例和配置详见:**
[/app/demo/](/app/demo/)
**启动采集:**
```shell
npm run start
```
## 里程碑
- [x] 分页采集
- [x] 详情页采集
- [x] 静态页面采集
- [x] 自动编码检测
- [ ] 复杂分页采集
- [x] GET 采集
- [x] POST 采集
- [ ] API 采集
- [ ] 定时采集
- [ ] 采集状态持久化
- [ ] 采集日志
- [ ] 自动任务扫描
- [ ] 任务管理器