https://github.com/fahimfba/simple-web-scrapper

Extract data from websites using the web-scrapper. Made with nodejs, ExpressJS, axios & cheerio.
https://github.com/fahimfba/simple-web-scrapper

axios cheerio cheeriojs javascript js npm npm-package webscrape webscraping webscraping-data webscraping-search webscrapper

Last synced: 3 months ago
JSON representation

Extract data from websites using the web-scrapper. Made with nodejs, ExpressJS, axios & cheerio.

Host: GitHub
URL: https://github.com/fahimfba/simple-web-scrapper
Owner: FahimFBA
License: mit
Created: 2021-09-27T16:51:08.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-05-02T05:41:16.000Z (9 months ago)
Last Synced: 2025-05-02T06:24:16.833Z (9 months ago)
Topics: axios, cheerio, cheeriojs, javascript, js, npm, npm-package, webscrape, webscraping, webscraping-data, webscraping-search, webscrapper
Language: JavaScript
Homepage: https://fahimfba.github.io/Web-Scraper/
Size: 1.12 MB
Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Web Scraper

A simple Node.js application to scrape article titles and URLs from The Guardian's international news section.

## Description

This project uses `axios` to fetch the HTML content from `https://www.theguardian.com/international` and `cheerio` to parse the HTML and extract relevant article information (specifically, titles and URLs based on the CSS selector `.dcr-5rptw1`).

Currently, the scraped data is logged to the console when the application starts. An Express server is initialized on port 8000 but does not yet serve any data or provide API endpoints.

## Prerequisites

- Node.js and npm (or yarn) installed on your system.

## Installation

1. Clone the repository:
```bash
git clone https://github.com/FahimFBA/Web-Scraper.git
cd Web-Scraper
```
2. Install the dependencies:
```bash
npm install
```
or
```bash
yarn install
```

## Usage

To run the scraper, use the following command:

```bash
npm start
```

This will start the application using `nodemon`, which automatically restarts the server on file changes. The scraped article titles and URLs will be printed to your terminal console.

## Future Enhancements (Potential)

- Implement API endpoints using Express to serve the scraped data.
- Add error handling for network requests and parsing.
- Make the target URL and CSS selectors configurable.
- Store the scraped data in a database or file.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fahimfba/simple-web-scrapper

Awesome Lists containing this project

README