https://github.com/igorantun/news-scraper
📰 Scraper for Brazilian newspapers Estadão, Folha de S.Paulo, g1 and VEJA
https://github.com/igorantun/news-scraper
agenda-setting brazil newspaper scraping
Last synced: 7 months ago
JSON representation
📰 Scraper for Brazilian newspapers Estadão, Folha de S.Paulo, g1 and VEJA
- Host: GitHub
- URL: https://github.com/igorantun/news-scraper
- Owner: igorantun
- License: mit
- Created: 2022-05-18T16:19:18.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-01-10T01:29:32.000Z (almost 3 years ago)
- Last Synced: 2025-02-27T00:50:17.622Z (8 months ago)
- Topics: agenda-setting, brazil, newspaper, scraping
- Language: JavaScript
- Homepage:
- Size: 3.49 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# 📰 News Scraper
## Description
This is an automated script that scrapes the websites of 5 major Brazilian newspapers (Estadão, Folha, g1, UOL and VEJA). It scrapes the homepage of each newspaper and extracts the news headlines, links, summary and more. It then exports the report data to HTML, JSON, PDF and/or image files.
## Getting started
### Prerequesites
- Docker
- Docker Compose### Cloning and copying .env example
```sh
$ git clone git@github.com:igorantun/news-scraper.git
$ cd news-scraper
$ cp .env.example .env
```#### Other requirements
You should also copy your Firebase `serviceAccountKey.json` file to the `src/config` folder.
## Make commands
```sh
$ make news-scraper # Starts production news scraper worker, with Logflare and Firebase integration enabled
$ make news-scraper-dev # Starts development news scraper worker, with nodemon
$ make clean # Deletes all generated files under ./reports
$ make stop # Stops all services
```## License
Released under the MIT License. See the [LICENSE](LICENSE) file
for details.