Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/beatrizmilz/noticiasgov
Raspagem de dados de portais de noticias governamentais
https://github.com/beatrizmilz/noticiasgov
git-scraping r rstats web-scraping
Last synced: 4 days ago
JSON representation
Raspagem de dados de portais de noticias governamentais
- Host: GitHub
- URL: https://github.com/beatrizmilz/noticiasgov
- Owner: beatrizmilz
- License: gpl-3.0
- Created: 2021-11-24T14:37:13.000Z (almost 3 years ago)
- Default Branch: master
- Last Pushed: 2024-03-07T18:03:23.000Z (8 months ago)
- Last Synced: 2024-08-01T13:29:09.810Z (3 months ago)
- Topics: git-scraping, r, rstats, web-scraping
- Language: R
- Homepage:
- Size: 45.8 MB
- Stars: 12
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```# noticiasgov
[![atualiza_dados](https://github.com/beatrizmilz/noticiasgov/actions/workflows/atualizar_dados.yaml/badge.svg)](https://github.com/beatrizmilz/noticiasgov/actions/workflows/atualizar_dados.yaml)
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://lifecycle.r-lib.org/articles/stages.html#experimental)O objetivo deste repositório/pacote é raspar as notícias de portais de noticias governamentais, e disponibilizar em `.csv`.
```{r echo=FALSE}
library(magrittr)
tibble::tribble(
~estado, ~nome_site, ~url_site, ~url_csv, ~cod_r, ~freq_action,
"SP",
"Portal do Governo do Estado de São Paulo",
"https://www.saopaulo.sp.gov.br/ultimas-noticias/",
"https://raw.githubusercontent.com/beatrizmilz/noticiasgov/master/inst/base_noticias_gov_sp.csv",
'`base_noticias_gov_sp <- readr::read_delim("https://raw.githubusercontent.com/beatrizmilz/noticiasgov/master/inst/base_noticias_gov_sp.csv", delim = ";")` ',
"A cada 6 horas"
) %>%
dplyr::transmute(
Estado = estado,
Fonte = glue::glue("[{nome_site}]({url_site})"),
`Freq. de atualização` = freq_action,
`Baixar base` = glue::glue("[`.csv`]({url_csv})"),
`Código para importar no R` = cod_r
) %>%
knitr::kable()
```
## Exemplo dos dados disponíveis```{r echo=TRUE, message=FALSE, warning=FALSE}
base_noticias_gov_sp <- readr::read_delim("https://raw.githubusercontent.com/beatrizmilz/noticiasgov/master/inst/base_noticias_gov_sp.csv", delim = ";")dplyr::glimpse(base_noticias_gov_sp)
```Pesquisar as notícias que contém algum termo ao longo do tempo:
```{r grafico-sp-covid-noticias, echo=TRUE, dpi=300}
library(ggplot2)noticias_sp_filtradas <- base_noticias_gov_sp %>%
dplyr::mutate(titulo_clean = stringr::str_to_lower(titulo),
titulo_clean = abjutils::rm_accent(titulo_clean)) %>%
dplyr::filter(
stringr::str_detect(titulo_clean, "rio pinheiros")
)noticias_sp_filtradas |>
dplyr::mutate(titulo_url = glue::glue("[{titulo}]({url_noticia})")) |>
dplyr::select(data, titulo_url) |>
knitr::kable()
```