An open API service indexing awesome lists of open source software.

https://github.com/openwashdata/portawaterperu

Data about Peruvian Portable Water System
https://github.com/openwashdata/portawaterperu

Last synced: 4 months ago
JSON representation

Data about Peruvian Portable Water System

Awesome Lists containing this project

README

          

---
output: github_document
always_allow_html: true
editor_options:
markdown:
wrap: 72
chunk_output_type: console
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
message = FALSE,
warning = FALSE,
fig.retina = 2,
fig.align = 'center'
)
```

# portawaterperu

[![License: CC BY
4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
[![R-CMD-check](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml)

The goal of the package `portawaterperu` is to provide access to data about community portable water systems in Peru. The data is collected from SIASAR database consisted of information and surveys about water catchments, storage system, treatment, distribution networks and maintainance.

## Installation

You can install the development version of portawaterperu from
[GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("openwashdata/portawaterperu")
```

```{r}
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)
```

Alternatively, you can download the individual datasets as a CSV or XLSX
file from the table below.

```{r, echo=FALSE, message=FALSE, warning=FALSE}

extdata_path <- "https://github.com/openwashdata/portawaterperu/raw/main/inst/extdata/"

read_csv("data-raw/dictionary.csv") |>
distinct(file_name) |>
dplyr::mutate(file_name = str_remove(file_name, ".rda")) |>
dplyr::rename(dataset = file_name) |>
mutate(
CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),
XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")
) |>
knitr::kable()

```

## Data

The package provides access to one dataset `portawaterperu`.

```{r}
library(portawaterperu)
```

### portawaterperu

The dataset `portawaterperu` contains data abour portable water system from 32 communities in Peru. It has
`r nrow(portawaterperu)` observations and `r ncol(portawaterperu)`
variables

```{r}
portawaterperu |>
head(3) |>
gt::gt() |>
gt::as_raw_html()
```

For an overview of the variable names, see the following table.

```{r echo=FALSE, message=FALSE, warning=FALSE}
readr::read_csv("data-raw/dictionary.csv") |>
dplyr::filter(file_name == "portawaterperu.rda") |>
dplyr::select(variable_name:description) |>
knitr::kable() |>
kableExtra::kable_styling("striped") |>
kableExtra::scroll_box(height = "200px")
```

## Example

```{r}
library(portawaterperu)
library(ggplot2)
# Provide some example code here
portawaterperu |>
#dplyr::filter(stringr::str_starts(divisiones, "AMAZONAS")) |>
#dplyr::group_by(divisiones) |>
#dplyr::summarise(mean = mean(pob_servida)) |>
ggplot(aes(y = pop_serviced, color = type_gravity))+
geom_boxplot(outliers = F)+
labs(title = "Population served given different gravity types",
y= "Population") +
theme_classic()

```

## Capstone Project
This dataset is shared as part of a capstone project in Data Science for openwashdata. For more information about the project and to explore further insights, please visit the project page at https://ds4owd-001.github.io/project-laurenjudah/ (to be public available)

## Methodology
The data was obtained from @SIASAR, an information system containing data on rural water supply and sanitation services. Using SIASAR's "download data by country" tool, all available data for Peru (10 excel files) were downloaded. After examining the 10 excel files, only 5 pertained to potable water systems. Those 5 data sets were imported into R and subsequently empty values and unnecessary columns were deleted from them. Finally, the 5 data sets were combined into 1 data frame based on community ID. The combined, cleaned data set contains data from 32 communities.

SIASAR does not provide a matching data dictionary. openwashdata developer went through the attachements of the original questionnaire:

- https://globalsiasar.org/es/content/documentacion-tecnica
- https://globalsiasar.org/en/content/technical-documentation

The variable description is written with our best guess with the information from the attachements.

## License

Data are available as
[CC-BY](https://github.com/openwashdata/portawaterperu/blob/main/LICENSE.md).

## Citation

Please cite this package using:

```{r}
citation("portawaterperu")
```