An open API service indexing awesome lists of open source software.

https://github.com/reconhub/linelist

An R package to import, clean, and store case data
https://github.com/reconhub/linelist

Last synced: 6 months ago
JSON representation

An R package to import, clean, and store case data

Awesome Lists containing this project

README

        

---
output: github_document
---

```{r setup, echo = FALSE, message = FALSE, results = 'hide'}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 8,
fig.height = 6,
fig.path = "figs/",
echo = TRUE
)

```

# Welcome to the *linelist* package!

[![Travis build status](https://travis-ci.org/reconhub/linelist.svg?branch=master)](https://travis-ci.org/reconhub/linelist)
[![Codecov test coverage](https://codecov.io/gh/reconhub/linelist/branch/master/graph/badge.svg)](https://codecov.io/gh/reconhub/linelist?branch=master)

This package is dedicated to simplifying the cleaning and standardisation of
linelist data. Considering a case linelist `data.frame`, it aims to:

- standardise the variables names, replacing all non-ascii characters with their
closest latin equivalent, removing blank spaces and other separators,
enforcing lower case capitalisation, and using a single separator between
words

- standardise the labels used in all variables of type `character` and `factor`,
as above

- set `POSIXct` and `POSIXlt` to `Date` objects

- extract dates from a messy variable, automatically detecting formats, allowing
inconsistent formats, and dates flanked by other text

## Installing the package

To install the current stable, CRAN version of the package, type:
```{r install, eval = FALSE}
install.packages("linelist")
```

To benefit from the latest features and bug fixes, install the development, *github* version of the package using:
```{r install2, eval = FALSE}
devtools::install_github("reconhub/linelist")
```

Note that this requires the package *devtools* installed.

# What does it do?

## Data cleaning

Procedures to clean data, first and foremost aimed at `data.frame` formats,
include:

- `clean_data()`: the main function, taking a `data.frame` as input, and doing all
the variable names, internal labels, and date processing described above

- `clean_variable_names()`: like `clean_data`, but only the variable names

- `clean_variable_labels()`: like `clean_data`, but only the variable labels

- `clean_variable_spelling()`: provided with a dictionary, will correct the
spelling of values in a variable and can globally correct commonly mis-spelled
words.

- `clean_dates()`: like `clean_data`, but only the dates

- `guess_dates()`: find dates in various, unspecified formats in a messy
`character` vector

# Worked example

Let us consider some messy `data.frame` as a toy example:

```{r toy_data}

## make toy data
onsets <- as.Date("2018-01-01") + sample(1:10, 20, replace = TRUE)
discharge <- format(as.Date(onsets) + 10, "%d/%m/%Y")
genders <- c("male", "female", "FEMALE", "Male", "Female", "MALE")
gender <- sample(genders, 20, replace = TRUE)
case_types <- c("confirmed", "probable", "suspected", "not a case",
"Confirmed", "PROBABLE", "suspected ", "Not.a.Case")
messy_dates <- sample(
c("01-12-2001", "male", "female", "2018-10-18", "2018_10_17",
"2018 10 19", "// 24//12//1989", NA, "that's 24/12/1989!"),
20, replace = TRUE)
case <- factor(sample(case_types, 20, replace = TRUE))
toy_data <- data.frame("Date of Onset." = onsets,
"DisCharge.." = discharge,
"SeX_ " = gender,
"Épi.Case_définition" = case,
"messy/dates" = messy_dates)
## show data
toy_data

```

We start by cleaning these data:

```{r clean_data}
## load library
library(linelist)

## clean data with defaults
x <- clean_data(toy_data)
x

```

## Getting help online

Bug reports and feature requests should be posted on *github* using the [*issue*](http://github.com/reconhub/linelist/issues) system. All other questions should be posted on the **RECON forum**:

[http://www.repidemicsconsortium.org/forum/](http://www.repidemicsconsortium.org/forum/)

Contributions are welcome via **pull requests**.

Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.