https://github.com/wangshibiaoflytiger/spider_puppeteer
nodejs puppeteer爬虫开发脚手架
https://github.com/wangshibiaoflytiger/spider_puppeteer
crawl nodejs puppeteer spider
Last synced: 8 months ago
JSON representation
nodejs puppeteer爬虫开发脚手架
- Host: GitHub
- URL: https://github.com/wangshibiaoflytiger/spider_puppeteer
- Owner: wangshibiaoFlytiger
- Created: 2019-12-18T08:40:58.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2019-12-18T08:41:26.000Z (almost 6 years ago)
- Last Synced: 2025-01-06T09:45:43.924Z (9 months ago)
- Topics: crawl, nodejs, puppeteer, spider
- Language: JavaScript
- Size: 21.5 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: ReadMe
Awesome Lists containing this project
README
项目原始仓库地址: https://github.com/wangshibiaoFlytiger/spider_puppeteer
#若本项目给您带来收获, 还请您动动小拇指,右上角给点个赞哈,万分感谢哈哈!!!nodejs puppeteer爬虫开发脚手架
puppeteer是google开发的headless chrome浏览器, 用于爬取动态渲染的网页
本项目是爬取4399小游戏的数据一. 命令详解
1. pkg
yarn pkg, 将工程打包为独立的可执行文件
2. runPkg
运行独立的可执行文件
3. dev
调试或运行工程代码, 且支持es6的import和export语法二. 使用pkg打包node应用为独立的可执行文件
npm install pkg
若执行pkg ./spider.js -t node10-linux后, 提示链接超时, 则解决方法如下:
从https://github.com/zeit/pkg-fetch/releases下载对应版本(如uploaded-v2.6-node-v10.15.3-linux-x64)到本地, 放到本地对应目录下(如~/.pkg-cache/v2.6/)
注意,需要重命名,把前缀改为fetched(如fetched-v10.15.3-linux-x64)
pkg命令指定输入支持多种方式,详见https://jingsam.github.io/2018/03/02/pkg.html三. 爬虫框架puppeteer
有2个包选择一个即可:pupeteer和puppeteer-core
建议使用puppeteer-core, 因为pupeteer会默认下载一个浏览器,需要耗费下载时间