Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/irworkshop/campfin

R package to help wrangle campaign finance data 💸
https://github.com/irworkshop/campfin

campaign-finance data-journalism

Last synced: about 2 months ago
JSON representation

R package to help wrangle campaign finance data 💸

Awesome Lists containing this project

README

        

---
output: github_document
always_allow_html: yes
editor_options:
chunk_output_type: console
---

```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
if (!interactive()) {
options(width = 99)
}
```

# campfin

[![Lifecycle: maturing][life_badge]][life_link]
[![CRAN status][cran_badge]][cran_link]
![Downloads][dl_badge]
[![Codecov test coverage][cov_badge]][cov_link]
[![R build status][gh_badge]][gh_link]

The campfin package was created to facilitate the work being done on the
[The Accountability Project][tap], a tool created by
[The Investigative Reporting Workshop][irw] in Washington, DC. The
Accountability Project curates, cleans, and indexes public data to give
journalists, researchers and others a simple way to search across otherwise
siloed records.

The data focuses on people, organizations and locations. This package was
created specifically to help with state-level campaign finance data, although
the tools included are useful in general database exploration and normalization.

## Installation

You can install the released version of campfin from [CRAN][cran] with:

```{r install_cran, eval=FALSE}
install.packages("campfin")
```

The development version can be installed from [GitHub][gh] with:

```{r install_github, eval=FALSE}
# install.packages("remotes")
remotes::install_github("irworkshop/campfin")
```

## Normalize

The package was originally built to normalize geographic data using the
`normal_*()` functions, which take the messy self-reported geographic data of a
contributor, vendor, candidate, or committee and return [normalized text][txt]
that is more searchable. They are largely wrappers around the [stringr] package,
and can call other sub-functions to streamline normalization.

* `normal_address()` takes a _street_ address and reduces inconsistencies.
* `normal_zip()` takes [ZIP Codes][zip] and aims to return a valid 5-digit code.
* `normal_state()` takes US states and returns a [2 digit abbreviation][abbs].
* `normal_city()` takes cities and reduces inconsistencies.
* `normal_phone()` consistently formats US telephone numbers.

Please see the vignette on normalization for an example of how these functions
are used to fix a wide variety of string inconsistencies and make campaign
finance data more consistent.

## Data

```{r library, message=FALSE, warning=FALSE}
library(campfin)
library(tidyverse)
```

The campfin package contains a number of built in data frames and strings used
to help wrangle campaign finance data.

The `/data-raw` directory contains the code used to create the objects.

### zipcodes

The `zipcodes` (plural) table is a new version of the `zipcode` (singular) table
from the archived [zipcode] R package.

> This database was composed using ZIP code gazetteers from the US Census Bureau
from 1999 and 2000, augmented with additional ZIP code information The database
is believed to contain over 98% of the ZIP Codes in current use in the United
States. The remaining ZIP Codes absent from this database are entirely PO Box or
Firm ZIP codes added in the last five years, which are no longer published by
the Census Bureau, but in any event serve a very small minority of the
population (probably on the order of .1% or less). Although every attempt has
been made to filter them out, this data set may contain up to .5% false
positives, that is, ZIP codes that do not exist or are no longer in use but are
included due to erroneous data sources.

The included `valid_city` and `valid_zip` vectors are sorted, unique columns
from the `zipcodes` data frame.

```{r geo_df, collapse=TRUE, warning=FALSE, message=FALSE, error=FALSE}
sample_frac(zipcodes)
```

### `usps_*` and `valid_*`

The `usps_*` data frames were scraped from the official United States Postal
Service (USPS) [Postal Addressing Standards][pub]. These data frames are
designed to work with the abbreviation functionality of `normal_address()` and
`normal_city()` to replace common abbreviations with their full equivalent.

`usps_city` is a curated subset of `usps_state`, whose full version appear at
least once in the `valid_city` vector from `zipcodes`. The `valid_state` and
`valid_name` vectors contain the columns from `usps_state` and include
territories not found in R's build in `state.abb` and `state.name` vectors.

```{r usps}
sample_n(usps_street, 3)
sample_n(usps_state, 3)
setdiff(valid_state, state.abb)
```

*****

The campfin project is released with a [Contributor Code of Conduct][coc]. By
contributing, you agree to abide by its terms.

[life_badge]: https://img.shields.io/badge/lifecycle-maturing-blue.svg
[life_link]: https://lifecycle.r-lib.org/articles/stages.html
[cran_badge]: https://www.r-pkg.org/badges/version/campfin
[cran_link]: https://CRAN.R-project.org/package=campfin
[dl_badge]: https://cranlogs.r-pkg.org/badges/grand-total/campfin
[cov_badge]: https://img.shields.io/codecov/c/github/irworkshop/campfin/master.svg
[cov_link]: https://app.codecov.io/gh/irworkshop/campfin?branch=master
[gh_badge]: https://github.com/irworkshop/campfin/workflows/R-CMD-check/badge.svg
[gh_link]: https://github.com/irworkshop/campfin/actions
[tap]: https://www.publicaccountability.org/
[irw]: https://investigativereportingworkshop.org/
[fin]: https://en.wikipedia.org/wiki/Campaign_finance
[cran]: https://cran.r-project.org/package=campfin
[gh]: https://github.com/irworkshop/campfin
[tidyverse]: https://www.tidyverse.org/
[txt]: https://en.wikipedia.org/wiki/Text_normalization
[stringr]: https://github.com/tidyverse/stringr
[zip]: https://en.wikipedia.org/wiki/ZIP_Code
[abbs]: https://en.wikipedia.org/wiki/List_of_U.S._state_abbreviations
[zipcode]: https://cran.r-project.org/src/contrib/Archive/zipcode/
[pub]: https://pe.usps.com/text/pub28/28apc_002.htm
[coc]: https://www.contributor-covenant.org/version/2/0/code_of_conduct.html