Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/okfn-brasil/querido-diario
📰 Diários oficiais brasileiros acessÃveis a todos | 📰 Brazilian government gazettes, accessible to everyone.
https://github.com/okfn-brasil/querido-diario
civic-tech data-science governments-gazettes govtech hacktoberfest open-data politics scraping spider
Last synced: 27 days ago
JSON representation
📰 Diários oficiais brasileiros acessÃveis a todos | 📰 Brazilian government gazettes, accessible to everyone.
- Host: GitHub
- URL: https://github.com/okfn-brasil/querido-diario
- Owner: okfn-brasil
- License: mit
- Created: 2018-04-01T05:01:21.000Z (over 6 years ago)
- Default Branch: main
- Last Pushed: 2024-09-25T20:29:31.000Z (about 2 months ago)
- Last Synced: 2024-10-01T15:42:20.116Z (about 1 month ago)
- Topics: civic-tech, data-science, governments-gazettes, govtech, hacktoberfest, open-data, politics, scraping, spider
- Language: Python
- Homepage: https://queridodiario.ok.org.br/
- Size: 16.8 MB
- Stars: 1,081
- Watchers: 64
- Forks: 393
- Open Issues: 232
-
Metadata Files:
- Readme: docs/README-en-US.md
- Contributing: docs/CONTRIBUTING-en-US.md
- Funding: docs/FUNDING.yml
- License: LICENSE.md
- Code of conduct: docs/CODE_OF_CONDUCT-en-US.md
- Support: docs/SUPPORT-en-US.md
Awesome Lists containing this project
- awesome-govtech - Querido-diario - A project to scrape and make available government gazettes. (Others / Data Visualization)
- awesome-made-by-brazilians - querido-diario - brasil](https://github.com/okfn-brasil) (Uncategorized / Uncategorized)
README
**English (US)** | [Português (BR)](/docs/README.md)
# Querido Diário
Within the [Querido Diário ecosystem](https://docs.queridodiario.ok.org.br/en/latest/contributing/contribution-guide.html#ecosystem), this repository is responsible for **scraping official gazettes publishing sites**Find out more about [technologies](https://queridodiario.ok.org.br/tecnologia) and [history](https://queridodiario.ok.org.br/sobre) of the project on the [Querido Diário website](https://queridodiario.ok.org.br)
# Summary
- [How to contribute](#how-to-contribute)
- [Development Environment](#development-environment)
- [How to run](#how-to-run)
- [Troubleshooting](#troubleshooting)
- [Support](#support)
- [Thanks](#thanks)
- [Open Knowledge Brazil](#open-knowledge-brazil)
- [License](#license)# How to contribute
Thank you for considering contributing to Querido Diário! :tada:
You can find how to do it at [CONTRIBUTING-en-US.md](/docs/CONTRIBUTING-en-US.md)!
Also, check the [Querido Diário documentation](https://docs.queridodiario.ok.org.br/en/latest/) to help you.
# Development Environment
You need to have [Python](https://docs.python.org/3/) (+3.0) and [Scrapy](https://scrapy.org) framework installed.The commands below set it up in Linux operating system. They consist of creating a [virtual Python environment](https://docs.python.org/3/library/venv.html), installing the requirements listed in `requirements-dev` and the code standardization tool `pre-commit`.
``` console
python3 -m venv .venv
source .venv/bin/activate
pip install -r data_collection/requirements-dev.txt
pre-commit install
```> Configuration on other operating systems is available at ["how to setup the development environment"](/docs/CONTRIBUTING-en-US.md#how-to-setup-the-development-environment), including more details for those who want to contribute to the repository.
# How to run
To try running a scraper already integrated into the project or to test what you are developing, follow the commands:1. If you haven't already done so, activate the virtual environment in the `/querido-diario` directory:
``` console
source .venv/bin/activate
```
2. Go to the `data_collection` directory:
```console
cd data_collection
```
3. Check the available scrapers list:
```console
scrapy list
```
4. Run a listed scraper:
```console
scrapy crawl //example: scrapy crawl ba_acajutiba
```
5. The official gazettes collected from scraping will be saved in the `data_collection/data` folder6. When executing item 4, the scraper will collect all official gazettes from the publishing site of that municipality since the first digital edition. For smaller runs, use flags in the run command:
- `start_date=YYYY-MM-DD`: will set the collecting start date.
```console
scrapy crawl -a start_date=
```
- `end_date=YYYY-MM-DD`: will set the collecting end date. If omitted, it will assume the date of the day it is being executed.
```console
scrapy crawl -a end_date=
```# Troubleshooting
Check out the [troubleshooting](/docs/TROUBLESHOOTING-en-US.md) file to resolve the most common issues with project environment setup.# Support
Join our [community server](https://go.ok.org.br/discord) for exchanges about projects, questions, requests for help with contributions and talk about civic innovation in general.
# Thanks
This project is maintained by Open Knowledge Brazil and made possible thanks to the technical communities, the [Ambassadors of Civic Innovation](https://embaixadoras.ok.org.br/), volunteers and financial donors, in addition to partner universities, companies supporters and funders.Meet [who supports Querido Diario](https://queridodiario.ok.org.br/apoie#quem-apoia).
# Open Knowledge Brazil
[Open Knowledge Brazil](https://ok.org.br/) is a non-profit civil society organization whose mission is to use and develop civic tools, projects, public policy analysis and data journalism to promote free knowledge in the various fields of society.
All work produced by OKBR is openly and freely available.
# License
Code licensed under the [MIT License](/LICENSE.md).