https://github.com/igorantun/news-scraper

📰 Scraper for Brazilian newspapers Estadão, Folha de S.Paulo, g1 and VEJA
https://github.com/igorantun/news-scraper

agenda-setting brazil newspaper scraping

Last synced: 7 months ago
JSON representation

📰 Scraper for Brazilian newspapers Estadão, Folha de S.Paulo, g1 and VEJA

Host: GitHub
URL: https://github.com/igorantun/news-scraper
Owner: igorantun
License: mit
Created: 2022-05-18T16:19:18.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2023-01-10T01:29:32.000Z (almost 3 years ago)
Last Synced: 2025-02-27T00:50:17.622Z (8 months ago)
Topics: agenda-setting, brazil, newspaper, scraping
Language: JavaScript
Homepage:
Size: 3.49 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# 📰 News Scraper

## Description

This is an automated script that scrapes the websites of 5 major Brazilian newspapers (Estadão, Folha, g1, UOL and VEJA). It scrapes the homepage of each newspaper and extracts the news headlines, links, summary and more. It then exports the report data to HTML, JSON, PDF and/or image files.

## Getting started

### Prerequesites

- Docker
- Docker Compose

### Cloning and copying .env example

```sh
$ git clone git@github.com:igorantun/news-scraper.git
$ cd news-scraper
$ cp .env.example .env
```

#### Other requirements

You should also copy your Firebase `serviceAccountKey.json` file to the `src/config` folder.

## Make commands

```sh
$ make news-scraper # Starts production news scraper worker, with Logflare and Firebase integration enabled
$ make news-scraper-dev # Starts development news scraper worker, with nodemon
$ make clean # Deletes all generated files under ./reports
$ make stop # Stops all services
```

## License

Released under the MIT License. See the [LICENSE](LICENSE) file
for details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/igorantun/news-scraper

Awesome Lists containing this project

README