https://github.com/reconhub/linelist
An R package to import, clean, and store case data
https://github.com/reconhub/linelist
Last synced: 6 months ago
JSON representation
An R package to import, clean, and store case data
- Host: GitHub
- URL: https://github.com/reconhub/linelist
- Owner: reconhub
- License: other
- Created: 2018-10-16T10:24:50.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-09T12:38:57.000Z (over 2 years ago)
- Last Synced: 2024-08-13T07:15:04.585Z (10 months ago)
- Language: R
- Homepage: https://www.repidemicsconsortium.org/linelist
- Size: 977 KB
- Stars: 26
- Watchers: 5
- Forks: 5
- Open Issues: 17
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - reconhub/linelist - An R package to import, clean, and store case data (R)
README
---
output: github_document
---```{r setup, echo = FALSE, message = FALSE, results = 'hide'}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 8,
fig.height = 6,
fig.path = "figs/",
echo = TRUE
)```
# Welcome to the *linelist* package!
[](https://travis-ci.org/reconhub/linelist)
[](https://codecov.io/gh/reconhub/linelist?branch=master)This package is dedicated to simplifying the cleaning and standardisation of
linelist data. Considering a case linelist `data.frame`, it aims to:- standardise the variables names, replacing all non-ascii characters with their
closest latin equivalent, removing blank spaces and other separators,
enforcing lower case capitalisation, and using a single separator between
words
- standardise the labels used in all variables of type `character` and `factor`,
as above- set `POSIXct` and `POSIXlt` to `Date` objects
- extract dates from a messy variable, automatically detecting formats, allowing
inconsistent formats, and dates flanked by other text
## Installing the package
To install the current stable, CRAN version of the package, type:
```{r install, eval = FALSE}
install.packages("linelist")
```To benefit from the latest features and bug fixes, install the development, *github* version of the package using:
```{r install2, eval = FALSE}
devtools::install_github("reconhub/linelist")
```Note that this requires the package *devtools* installed.
# What does it do?
## Data cleaning
Procedures to clean data, first and foremost aimed at `data.frame` formats,
include:- `clean_data()`: the main function, taking a `data.frame` as input, and doing all
the variable names, internal labels, and date processing described above
- `clean_variable_names()`: like `clean_data`, but only the variable names- `clean_variable_labels()`: like `clean_data`, but only the variable labels
- `clean_variable_spelling()`: provided with a dictionary, will correct the
spelling of values in a variable and can globally correct commonly mis-spelled
words.- `clean_dates()`: like `clean_data`, but only the dates
- `guess_dates()`: find dates in various, unspecified formats in a messy
`character` vector
# Worked example
Let us consider some messy `data.frame` as a toy example:
```{r toy_data}
## make toy data
onsets <- as.Date("2018-01-01") + sample(1:10, 20, replace = TRUE)
discharge <- format(as.Date(onsets) + 10, "%d/%m/%Y")
genders <- c("male", "female", "FEMALE", "Male", "Female", "MALE")
gender <- sample(genders, 20, replace = TRUE)
case_types <- c("confirmed", "probable", "suspected", "not a case",
"Confirmed", "PROBABLE", "suspected ", "Not.a.Case")
messy_dates <- sample(
c("01-12-2001", "male", "female", "2018-10-18", "2018_10_17",
"2018 10 19", "// 24//12//1989", NA, "that's 24/12/1989!"),
20, replace = TRUE)
case <- factor(sample(case_types, 20, replace = TRUE))
toy_data <- data.frame("Date of Onset." = onsets,
"DisCharge.." = discharge,
"SeX_ " = gender,
"Épi.Case_définition" = case,
"messy/dates" = messy_dates)
## show data
toy_data```
We start by cleaning these data:
```{r clean_data}
## load library
library(linelist)## clean data with defaults
x <- clean_data(toy_data)
x```
## Getting help online
Bug reports and feature requests should be posted on *github* using the [*issue*](http://github.com/reconhub/linelist/issues) system. All other questions should be posted on the **RECON forum**:
[http://www.repidemicsconsortium.org/forum/](http://www.repidemicsconsortium.org/forum/)Contributions are welcome via **pull requests**.
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.