https://github.com/asenetcky/distiller

Distill your wrangled data down to the CDC's EPHT XML format
https://github.com/asenetcky/distiller

cdc epht r r-package rstats rstats-package xml

Last synced: 6 months ago
JSON representation

Distill your wrangled data down to the CDC's EPHT XML format

Host: GitHub
URL: https://github.com/asenetcky/distiller
Owner: asenetcky
License: other
Created: 2024-10-29T13:21:50.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-03-18T20:25:20.000Z (over 1 year ago)
Last Synced: 2025-03-18T21:26:36.794Z (over 1 year ago)
Topics: cdc, epht, r, r-package, rstats, rstats-package, xml
Language: R
Homepage: https://asenetcky.github.io/distiller/
Size: 1.78 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd
- Changelog: NEWS.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

          ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "man/figures/README-",

  out.width = "100%"

)

```

# distiller

[![R-CMD-check](https://github.com/asenetcky/distiller/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/asenetcky/distiller/actions/workflows/R-CMD-check.yaml)

[![Codecov test coverage](https://codecov.io/gh/asenetcky/distiller/graph/badge.svg)](https://app.codecov.io/gh/asenetcky/distiller)

## Motivation

As a newbie who has to submit data to the CDC's EPHT program, I was

dismayed to find out that the documentation is buried under many layers 

inside their SharePoint.  It is also highly fragmented, convoluted and in many

cases, conflicts with itself.  

My goal is to make this process easier and reproducible for myself, and others.

So who is this highly specific package for?

  *  Do you submit data to the CDC's EPHT program?

  *  Do you use R? Or are interested in incorporating R into your workflow?

  *  Do you struggle with the CDC's EPHT documenation and/or tooling?

  *  Do you want to make your submission process more reproducible?

  

  If you answered yes to the first question and any of the others, 

  then this package might be for you.

  

## What does this package do?

I think it's important to state up front what this package _doesn't_ do - and

that is, it will not wrangle your data for you.  There are a few helpers, and

and a whole slew of checks `distiller` will run on your data and metadata to

ensure that everything is reasonably close to the  correct format 

for submission to the CDC's EPHT program.  

`distiller` still expects your data to have specific variable names, and

to have the required variables for each type of data.  However, if you've

ever wondered why the epht requires different _variable names_ 

in a _different order_ for the same types of data, even for the _same disease_

you'll be pleased to know that distiller takes care of the 

facility-type-specific naming conventions and the ordering for you. Users just

need to bring the data and now they can spend less time worrying about 

XML semantics and more time polishing their data products.

`disitller` is __no__ replacement for the CDC EPHPT Test Submission portal, 

however, creating the XML, and shuffling files around and then 

dropping them into the portal and waiting an indeterminate amount of time for 

feedback eats up time and is a pain. 

`distiller` aims to provide feedback on your data and metadata

before you send it off to the CDC.  This way, you can fix any obvious issues 

before you sink 20+ minutes waiting to find out you forgot to replace your `NA`'s

with "U".

## What's in the box?

`distiller` contains the following core functions:

  *  `check_submission()` - a function that checks your data and metadata and

  provides quick feedback

  *  `make_xml_document()` - a function that creates an xml document for 

  submission based on your data and the metadata your provide it

  

  `distiller` also contains functions for:

  

  * collapsing race and ethnicity values into the CDC's required format

  * converting month integers to 0-padded character strings

  * return the proper health outcome identifier for a given content group identifier

  * Starting from scratch? Most of the mini-functions that make up the two core 

  ones are exposed to the user, so you can check your work in pieces as you make

  progress with your data wrangling

## `distiller` expectations and scope

`distiller` works for the following content group identifiers:

  -  AS-HOSP

  -  AS-ED

  -  CO-HOSP

  -  CO-ED

  -  MI-HOSP

  -  HEAT-HOSP

  -  HEAT-ED

  -  COPD-HOSP

  -  COPD-ED

  

  `distiller` expects the following variables in your data:

  

  For every content group identifier:

  

  -  agegroup

  -  county

  -  sex

  -  ethnicity

  -  race

  -  health_outcome_id,

  -  monthly_count

  -  month

  -  year

  For content group identifiers CO-HOSP and CO-ED, the above plus the following:

  

  -  fire_count

  -  nonfire_count

  -  unknown_count

  

## Installation

You can install the development version of distiller from [GitHub](https://github.com/) with:

``` r

# install.packages("pak")

pak::pak("asenetcky/distiller")

```

## Example

Here is a basic example of how to use it:

```{r example}

library(distiller)

# Take you already-wrangled data

# note the specific variable names

data <-

  mtcars |>

  dplyr::rename(

    month = mpg,

    agegroup = cyl,

    county = disp,

    ethnicity = hp,

    health_outcome_id = drat,

    monthly_count = wt,

    race = qsec,

    sex = vs,

    year = am

  ) |>

  dplyr::select(-c(gear, carb))

# And your metadata

content_group_id <- "AS-HOSP"

mcn <- "1234-1234-1234-1234-1234"

jurisdiction_code <- "two_letter_code"

state_fips_code <- "1234"

submitter_email <- "submitter@email.com"

submitter_name <- "Submitter Name"

submitter_title <- "Submitter Title"

# Optionally check your submission data structure and metadata

check_submission(

  data,

  content_group_id,

  mcn,

  jurisdiction_code,

  state_fips_code,

  submitter_email,

  submitter_name,

  submitter_title

)

# This can also be checked with `check_first = TRUE` in `make_xml_document()`

# And then make your xml document

make_xml_document(

  data,

  content_group_id,

  mcn,

  jurisdiction_code,

  state_fips_code,

  submitter_email,

  submitter_name,

  submitter_title

)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/asenetcky/distiller

Awesome Lists containing this project

README