Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mmqnym/pyppeteer-use-case
Show how to do web crawl via pyppeteer
https://github.com/mmqnym/pyppeteer-use-case
crawl crawler pyppeteer python
Last synced: about 4 hours ago
JSON representation
Show how to do web crawl via pyppeteer
- Host: GitHub
- URL: https://github.com/mmqnym/pyppeteer-use-case
- Owner: mmqnym
- License: mit
- Created: 2022-07-30T10:23:45.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-08-04T21:17:07.000Z (over 2 years ago)
- Last Synced: 2024-05-16T04:24:48.778Z (6 months ago)
- Topics: crawl, crawler, pyppeteer, python
- Language: Python
- Homepage:
- Size: 14.6 KB
- Stars: 1
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pyppeteer usecase
## Set up
``` sh
pip install pyppeteer
```
This package is used to hide some webdrive attributes
``` sh
pip install pyppeteer_stealth
```## Experience
- You can use https://bot.sannysoft.com/ to check if you are being detected as a bot.
- There are several Chromium arguments for reference in the `Reference` section below.
- Most of the problems can be solved by the discussions in `puppetter` or the related docs.
- **Some of the situations detected as bot may not be caused by the code, just don't use the default Chronium download the latest version of chromium, copy-paste that package in your main file and use `executablePath = f'{os.getcwd()}\chromium\chrome.exe'` in lunch, most of them can be solved.**
- You can refer to the `troubleshooting.md` of [Nodejs puppeteer docs](https://github.com/puppeteer/puppeteer/tree/main/docs), which has a record of some common problems.
## Reference
- ##### [Python Asyncio Guide](https://docs.python.org/3/library/asyncio-dev.html)
- ##### [Chromium Command Line Switches](https://peter.sh/experiments/chromium-command-line-switches/)
- ##### [Nodejs puppeteer docs](https://github.com/puppeteer/puppeteer/tree/main/docs)
- ##### [Pyppeteer](https://github.com/pyppeteer/pyppeteer)
- ##### [Pyppeteer Docs](https://miyakogi.github.io/pyppeteer/)
- ##### [Hide webdrive attributes](https://github.com/MeiK2333/pyppeteer_stealth)
> ### 中文補充資料 (Guide in Mandarin)
- ##### [Asyncio 從頭入門](https://ithelp.ithome.com.tw/articles/10199385)
- ##### [Asyncio 架構探討](https://www.ithome.com.tw/voice/138875)
## License
[MIT](LICENSE)