https://github.com/openwashdata/portawaterperu

Data about Peruvian Portable Water System
https://github.com/openwashdata/portawaterperu

Last synced: 6 months ago
JSON representation

Data about Peruvian Portable Water System

Host: GitHub
URL: https://github.com/openwashdata/portawaterperu
Owner: openwashdata
License: cc-by-4.0
Created: 2024-06-18T12:59:23.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-07-30T13:10:38.000Z (almost 2 years ago)
Last Synced: 2025-09-04T20:42:18.840Z (10 months ago)
Language: HTML
Homepage: https://openwashdata.github.io/portawaterperu/
Size: 6.05 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.Rmd
- License: LICENSE.md
- Citation: CITATION.cff
- Zenodo: .zenodo.json

Awesome Lists containing this project

README

          ---

output: github_document

always_allow_html: true

editor_options: 

  markdown: 

    wrap: 72

  chunk_output_type: console

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%",

  message = FALSE,

  warning = FALSE,

  fig.retina = 2,

  fig.align = 'center'

)

```

# portawaterperu

[![License: CC BY

4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)

[![R-CMD-check](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/openwashdata/portawaterperu/actions/workflows/R-CMD-check.yaml)

The goal of the package `portawaterperu` is to provide access to data about community portable water systems in Peru. The data is collected from SIASAR database consisted of information and surveys about water catchments, storage system, treatment, distribution networks and maintainance. 

## Installation

You can install the development version of portawaterperu from

[GitHub](https://github.com/) with:

``` r

# install.packages("devtools")

devtools::install_github("openwashdata/portawaterperu")

```

```{r}

## Run the following code in console if you don't have the packages

## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))

library(dplyr)

library(knitr)

library(readr)

library(stringr)

library(gt)

library(kableExtra)

```

Alternatively, you can download the individual datasets as a CSV or XLSX

file from the table below.

```{r, echo=FALSE, message=FALSE, warning=FALSE}

extdata_path <- "https://github.com/openwashdata/portawaterperu/raw/main/inst/extdata/"

read_csv("data-raw/dictionary.csv") |> 

  distinct(file_name) |> 

  dplyr::mutate(file_name = str_remove(file_name, ".rda")) |> 

  dplyr::rename(dataset = file_name) |> 

  mutate(

    CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),

    XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")

  ) |> 

  knitr::kable()

```

## Data

The package provides access to one dataset `portawaterperu`.

```{r}

library(portawaterperu)

```

### portawaterperu

The dataset `portawaterperu` contains data abour portable water system from 32 communities in Peru. It has

`r nrow(portawaterperu)` observations and `r ncol(portawaterperu)`

variables

```{r}

portawaterperu |> 

  head(3) |> 

  gt::gt() |>

  gt::as_raw_html()

```

For an overview of the variable names, see the following table.

```{r echo=FALSE, message=FALSE, warning=FALSE}

readr::read_csv("data-raw/dictionary.csv") |>

  dplyr::filter(file_name == "portawaterperu.rda") |>

  dplyr::select(variable_name:description) |> 

  knitr::kable() |> 

  kableExtra::kable_styling("striped") |> 

  kableExtra::scroll_box(height = "200px")

```

## Example

```{r}

library(portawaterperu)

library(ggplot2)

# Provide some example code here

portawaterperu |> 

  #dplyr::filter(stringr::str_starts(divisiones, "AMAZONAS")) |>

  #dplyr::group_by(divisiones) |> 

  #dplyr::summarise(mean = mean(pob_servida)) |> 

  ggplot(aes(y = pop_serviced, color = type_gravity))+

  geom_boxplot(outliers = F)+

  labs(title = "Population served given different gravity types",

       y= "Population") +

  theme_classic()

  

```

## Capstone Project

This dataset is shared as part of a capstone project in Data Science for openwashdata. For more information about the project and to explore further insights, please visit the project page at https://ds4owd-001.github.io/project-laurenjudah/ (to be public available)

## Methodology

The data was obtained from @SIASAR, an information system containing data on rural water supply and sanitation services. Using SIASAR's "download data by country" tool, all available data for Peru (10 excel files) were downloaded. After examining the 10 excel files, only 5 pertained to potable water systems. Those 5 data sets were imported into R and subsequently empty values and unnecessary columns were deleted from them. Finally, the 5 data sets were combined into 1 data frame based on community ID. The combined, cleaned data set contains data from 32 communities.

SIASAR does not provide a matching data dictionary. openwashdata developer went through the attachements of the original questionnaire:

- https://globalsiasar.org/es/content/documentacion-tecnica

- https://globalsiasar.org/en/content/technical-documentation

The variable description is written with our best guess with the information from the attachements.

## License

Data are available as

[CC-BY](https://github.com/openwashdata/portawaterperu/blob/main/LICENSE.md).

## Citation

Please cite this package using:

```{r}

citation("portawaterperu")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/openwashdata/portawaterperu

Awesome Lists containing this project

README