https://github.com/fahimfba/simple-web-scrapper
Extract data from websites using the web-scrapper. Made with nodejs, ExpressJS, axios & cheerio.
https://github.com/fahimfba/simple-web-scrapper
axios cheerio cheeriojs javascript js npm npm-package webscrape webscraping webscraping-data webscraping-search webscrapper
Last synced: 3 months ago
JSON representation
Extract data from websites using the web-scrapper. Made with nodejs, ExpressJS, axios & cheerio.
- Host: GitHub
- URL: https://github.com/fahimfba/simple-web-scrapper
- Owner: FahimFBA
- License: mit
- Created: 2021-09-27T16:51:08.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-05-02T05:41:16.000Z (9 months ago)
- Last Synced: 2025-05-02T06:24:16.833Z (9 months ago)
- Topics: axios, cheerio, cheeriojs, javascript, js, npm, npm-package, webscrape, webscraping, webscraping-data, webscraping-search, webscrapper
- Language: JavaScript
- Homepage: https://fahimfba.github.io/Web-Scraper/
- Size: 1.12 MB
- Stars: 4
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Web Scraper
A simple Node.js application to scrape article titles and URLs from The Guardian's international news section.
## Description
This project uses `axios` to fetch the HTML content from `https://www.theguardian.com/international` and `cheerio` to parse the HTML and extract relevant article information (specifically, titles and URLs based on the CSS selector `.dcr-5rptw1`).
Currently, the scraped data is logged to the console when the application starts. An Express server is initialized on port 8000 but does not yet serve any data or provide API endpoints.
## Prerequisites
- Node.js and npm (or yarn) installed on your system.
## Installation
1. Clone the repository:
```bash
git clone https://github.com/FahimFBA/Web-Scraper.git
cd Web-Scraper
```
2. Install the dependencies:
```bash
npm install
```
or
```bash
yarn install
```
## Usage
To run the scraper, use the following command:
```bash
npm start
```
This will start the application using `nodemon`, which automatically restarts the server on file changes. The scraped article titles and URLs will be printed to your terminal console.
## Future Enhancements (Potential)
- Implement API endpoints using Express to serve the scraped data.
- Add error handling for network requests and parsing.
- Make the target URL and CSS selectors configurable.
- Store the scraped data in a database or file.