Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jglchen/web-scrape
This is a next.js framework site to demonstrate web scraping cases and my expertise in web scraping.
https://github.com/jglchen/web-scrape
cheerio docker nextjs nodejs puppeteer reactjs
Last synced: about 2 months ago
JSON representation
This is a next.js framework site to demonstrate web scraping cases and my expertise in web scraping.
- Host: GitHub
- URL: https://github.com/jglchen/web-scrape
- Owner: jglchen
- Created: 2023-01-30T13:00:27.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-02-15T13:44:10.000Z (almost 2 years ago)
- Last Synced: 2023-03-09T21:51:45.653Z (almost 2 years ago)
- Topics: cheerio, docker, nextjs, nodejs, puppeteer, reactjs
- Language: TypeScript
- Homepage: https://web-scrape.vercel.app
- Size: 434 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Web Scraping Demonstrations
This is a **[next.js](https://nextjs.org/)** framework site to demonstrate web scraping cases and my expertise in web scraping. Totally 9 scraping cases are presented at this moment, they are handled in API routes with **[node.js](https://nodejs.org/en/)**.
There are two main approaches to scraping the web:
1. HTTP clients to query the web and data extraction
2. headless browsersFor the first approach, we use [Cheerio](https://www.npmjs.com/package/cheerio), a library using jQuery on the server side, to crawl web pages. Sites, however, now become increasingly complex, and often regular HTTP crawling won't suffice anymore, but one needs a full-fledged browser engine, to get the necessary information from a site. This is particularly true for single-page applications which heavily rely on JavaScript and dynamic and asynchronous resources. Browser automation and headless browsers come to deal with the issues. Therefore we use [Puppeteer](https://pptr.dev/) to manipulate the browser programmatically. For the cases in this demonstration, we use either way depending on the actual situations of the target pages.
**iOS** and **Android** mobile apps are also delivered for the scraping demonstrations. The apps are developed with **React Native**, anyone who is interested can test the apps through the [Expo Publish Link](https://exp.host/@jglchen/web-scrape) with [Expo Go](https://expo.dev/client) app.
### [View the App](https://web-scrape.vercel.app)
### [App GitHub](https://github.com/jglchen/web-scrape)
### Docker: docker run -p 3000:3000 jglchen/web-scrape
### [React Native Expo Publish](https://expo.dev/@jglchen/web-scrape)
### [React Native GitHub](https://github.com/jglchen/react-native-web-scrape)