https://github.com/jglchen/web-scrape

This is a next.js framework site to demonstrate web scraping cases and my expertise in web scraping.
https://github.com/jglchen/web-scrape

cheerio docker nextjs nodejs puppeteer reactjs

Last synced: 4 months ago
JSON representation

This is a next.js framework site to demonstrate web scraping cases and my expertise in web scraping.

Host: GitHub
URL: https://github.com/jglchen/web-scrape
Owner: jglchen
Created: 2023-01-30T13:00:27.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2023-03-23T09:58:48.000Z (over 2 years ago)
Last Synced: 2025-01-20T16:25:29.935Z (6 months ago)
Topics: cheerio, docker, nextjs, nodejs, puppeteer, reactjs
Language: TypeScript
Homepage: https://web-scrape.vercel.app
Size: 434 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

## Web Scraping Demonstrations

This is a **[next.js](https://nextjs.org/)** framework site to demonstrate web scraping cases and my expertise in web scraping. Totally 9 scraping cases are presented at this moment, they are handled in API routes with **[node.js](https://nodejs.org/en/)**.

There are two main approaches to scraping the web:
1. HTTP clients to query the web and data extraction
2. headless browsers

For the first approach, we use [Cheerio](https://www.npmjs.com/package/cheerio), a library using jQuery on the server side, to crawl web pages. Sites, however, now become increasingly complex, and often regular HTTP crawling won't suffice anymore, but one needs a full-fledged browser engine, to get the necessary information from a site. This is particularly true for single-page applications which heavily rely on JavaScript and dynamic and asynchronous resources. Browser automation and headless browsers come to deal with the issues. Therefore we use [Puppeteer](https://pptr.dev/) to manipulate the browser programmatically. For the cases in this demonstration, we use either way depending on the actual situations of the target pages.

**iOS** and **Android** mobile apps are also delivered for the scraping demonstrations. The apps are developed with **React Native**, anyone who is interested can test the apps through the [Expo Publish Link](https://exp.host/@jglchen/web-scrape) with [Expo Go](https://expo.dev/client) app.

### [View the App](https://web-scrape.vercel.app)
### [App GitHub](https://github.com/jglchen/web-scrape)
### Docker: docker run -p 3000:3000 jglchen/web-scrape
### [React Native Expo Publish](https://expo.dev/@jglchen/web-scrape)
### [React Native GitHub](https://github.com/jglchen/react-native-web-scrape)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jglchen/web-scrape

Awesome Lists containing this project

README