https://github.com/okfn-brasil/querido-diario

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.
https://github.com/okfn-brasil/querido-diario

civic-tech data-science digital-public-goods dpg governments-gazettes govtech hacktoberfest open-data politics scraping sdg-16 spider

Last synced: 3 months ago
JSON representation

📰 Diários oficiais brasileiros acessíveis a todos | 📰 Brazilian government gazettes, accessible to everyone.

Host: GitHub
URL: https://github.com/okfn-brasil/querido-diario
Owner: okfn-brasil
License: mit
Created: 2018-04-01T05:01:21.000Z (about 7 years ago)
Default Branch: main
Last Pushed: 2025-03-30T05:43:08.000Z (3 months ago)
Last Synced: 2025-04-05T10:01:40.387Z (3 months ago)
Topics: civic-tech, data-science, digital-public-goods, dpg, governments-gazettes, govtech, hacktoberfest, open-data, politics, scraping, sdg-16, spider
Language: Python
Homepage: https://queridodiario.ok.org.br/
Size: 17.3 MB
Stars: 1,165
Watchers: 63
Forks: 416
Open Issues: 226
Metadata Files:
- Readme: docs/README-en-US.md
- Contributing: docs/CONTRIBUTING-en-US.md
- Funding: docs/FUNDING.yml
- License: LICENSE.md
- Code of conduct: docs/CODE_OF_CONDUCT-en-US.md
- Support: docs/SUPPORT-en-US.md

Awesome Lists containing this project

awesome-govtech - Querido-diario - A project to scrape and make available government gazettes. (Others / Data Visualization)
awesome-made-by-brazilians - querido-diario - brasil](https://github.com/okfn-brasil) (Uncategorized / Uncategorized)

README

        **English (US)** | [Português (BR)](/docs/README.md)



   

  



# Querido Diário

Within the [Querido Diário ecosystem](https://docs.queridodiario.ok.org.br/en/latest/contributing/contribution-guide.html#ecosystem), this repository is responsible for **scraping official gazettes publishing sites**

Find out more about [technologies](https://queridodiario.ok.org.br/tecnologia) and [history](https://queridodiario.ok.org.br/sobre) of the project on the [Querido Diário website](https://queridodiario.ok.org.br)

# Summary

- [How to contribute](#how-to-contribute)

- [Development Environment](#development-environment)

- [How to run](#how-to-run)

- [Troubleshooting](#troubleshooting)

- [Support](#support)

- [Thanks](#thanks)

- [Open Knowledge Brazil](#open-knowledge-brazil)

- [License](#license)

# How to contribute

  

   

    

  

 

Thank you for considering contributing to Querido Diário! :tada:

You can find how to do it at [CONTRIBUTING-en-US.md](/docs/CONTRIBUTING-en-US.md)!

Also, check the [Querido Diário documentation](https://docs.queridodiario.ok.org.br/en/latest/) to help you.

# Development Environment

You need to have [Python](https://docs.python.org/3/) (+3.0) and [Scrapy](https://scrapy.org) framework installed.

The commands below set it up in Linux operating system. They consist of creating a [virtual Python environment](https://docs.python.org/3/library/venv.html), installing the requirements listed in `requirements-dev` and the code standardization tool `pre-commit`.

``` console

python3 -m venv .venv

source .venv/bin/activate

pip install -r data_collection/requirements-dev.txt

pre-commit install

```

> Configuration on other operating systems is available at ["how to setup the development environment"](/docs/CONTRIBUTING-en-US.md#how-to-setup-the-development-environment), including more details for those who want to contribute to the repository.

# How to run

To try running a scraper already integrated into the project or to test what you are developing, follow the commands:

1. If you haven't already done so, activate the virtual environment in the `/querido-diario` directory:

``` console

source .venv/bin/activate

```

2. Go to the `data_collection` directory:

```console

cd data_collection

```

3. Check the available scrapers list:

```console

scrapy list

```

4. Run a listed scraper:

```console

scrapy crawl  //example: scrapy crawl ba_acajutiba

```

5. The official gazettes collected from scraping will be saved in the `data_collection/data` folder

6. When executing item 4, the scraper will collect all official gazettes from the publishing site of that municipality since the first digital edition. For smaller runs, use flags in the run command:

- `start_date=YYYY-MM-DD`: will set the collecting start date.

```console

scrapy crawl  -a start_date=

```

- `end_date=YYYY-MM-DD`: will set the collecting end date. If omitted, it will assume the date of the day it is being executed.

```console

scrapy crawl  -a end_date=

```

# Troubleshooting

Check out the [troubleshooting](/docs/TROUBLESHOOTING-en-US.md) file to resolve the most common issues with project environment setup.

# Support



  

    

  



Join our [community server](https://go.ok.org.br/discord) for exchanges about projects, questions, requests for help with contributions and talk about civic innovation in general.

# Thanks

This project is maintained by Open Knowledge Brazil and made possible thanks to the technical communities, the [Ambassadors of Civic Innovation](https://embaixadoras.ok.org.br/), volunteers and financial donors, in addition to partner universities, companies supporters and funders.

Meet [who supports Querido Diario](https://queridodiario.ok.org.br/apoie#quem-apoia).

# Open Knowledge Brazil



  

    

  

  

    

  

  

    

   

  

    

  



[Open Knowledge Brazil](https://ok.org.br/) is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analysis and data journalism to promote free knowledge in the various fields of society.

All work produced by OKBR is openly and freely available.

# License

Code licensed under the [MIT License](/LICENSE.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/okfn-brasil/querido-diario

Awesome Lists containing this project

README