{"id":41239648,"url":"https://github.com/asenetcky/distiller","last_synced_at":"2026-01-23T01:15:41.029Z","repository":{"id":260129096,"uuid":"880293927","full_name":"asenetcky/distiller","owner":"asenetcky","description":"Distill your wrangled data down to the CDC's EPHT XML format","archived":false,"fork":false,"pushed_at":"2025-03-18T20:25:20.000Z","size":1867,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-18T21:26:36.794Z","etag":null,"topics":["cdc","epht","r","r-package","rstats","rstats-package","xml"],"latest_commit_sha":null,"homepage":"https://asenetcky.github.io/distiller/","language":"R","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/asenetcky.png","metadata":{"files":{"readme":"README.Rmd","changelog":"NEWS.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-29T13:21:50.000Z","updated_at":"2024-11-21T21:15:50.000Z","dependencies_parsed_at":"2024-11-20T13:51:55.097Z","dependency_job_id":null,"html_url":"https://github.com/asenetcky/distiller","commit_stats":null,"previous_names":["asenetcky/distiller"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/asenetcky/distiller","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asenetcky%2Fdistiller","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asenetcky%2Fdistiller/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asenetcky%2Fdistiller/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asenetcky%2Fdistiller/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/asenetcky","download_url":"https://codeload.github.com/asenetcky/distiller/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/asenetcky%2Fdistiller/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28676952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T01:00:35.747Z","status":"ssl_error","status_checked_at":"2026-01-23T01:00:19.529Z","response_time":144,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdc","epht","r","r-package","rstats","rstats-package","xml"],"created_at":"2026-01-23T01:15:39.891Z","updated_at":"2026-01-23T01:15:41.022Z","avatar_url":"https://github.com/asenetcky.png","language":"R","funding_links":[],"categories":[],"sub_categories":[],"readme":"---\noutput: github_document\n---\n\n\u003c!-- README.md is generated from README.Rmd. Please edit that file --\u003e\n\n```{r, include = FALSE}\nknitr::opts_chunk$set(\n  collapse = TRUE,\n  comment = \"#\u003e\",\n  fig.path = \"man/figures/README-\",\n  out.width = \"100%\"\n)\n```\n\n# distiller\n\n\u003c!-- badges: start --\u003e\n[![R-CMD-check](https://github.com/asenetcky/distiller/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/asenetcky/distiller/actions/workflows/R-CMD-check.yaml)\n[![Codecov test coverage](https://codecov.io/gh/asenetcky/distiller/graph/badge.svg)](https://app.codecov.io/gh/asenetcky/distiller)\n\u003c!-- badges: end --\u003e\n\n## Motivation\nAs a newbie who has to submit data to the CDC's EPHT program, I was\ndismayed to find out that the documentation is buried under many layers \ninside their SharePoint.  It is also highly fragmented, convoluted and in many\ncases, conflicts with itself.  \n\nMy goal is to make this process easier and reproducible for myself, and others.\n\nSo who is this highly specific package for?\n\n  *  Do you submit data to the CDC's EPHT program?\n  *  Do you use R? Or are interested in incorporating R into your workflow?\n  *  Do you struggle with the CDC's EPHT documenation and/or tooling?\n  *  Do you want to make your submission process more reproducible?\n  \n  If you answered yes to the first question and any of the others, \n  then this package might be for you.\n  \n\n## What does this package do?\nI think it's important to state up front what this package _doesn't_ do - and\nthat is, it will not wrangle your data for you.  There are a few helpers, and\nand a whole slew of checks `distiller` will run on your data and metadata to\nensure that everything is reasonably close to the  correct format \nfor submission to the CDC's EPHT program.  \n\n`distiller` still expects your data to have specific variable names, and\nto have the required variables for each type of data.  However, if you've\never wondered why the epht requires different _variable names_ \nin a _different order_ for the same types of data, even for the _same disease_\nyou'll be pleased to know that distiller takes care of the \nfacility-type-specific naming conventions and the ordering for you. Users just\nneed to bring the data and now they can spend less time worrying about \nXML semantics and more time polishing their data products.\n\n`disitller` is __no__ replacement for the CDC EPHPT Test Submission portal, \nhowever, creating the XML, and shuffling files around and then \ndropping them into the portal and waiting an indeterminate amount of time for \nfeedback eats up time and is a pain. \n`distiller` aims to provide feedback on your data and metadata\nbefore you send it off to the CDC.  This way, you can fix any obvious issues \nbefore you sink 20+ minutes waiting to find out you forgot to replace your `NA`'s\nwith \"U\".\n\n## What's in the box?\n`distiller` contains the following core functions:\n\n  *  `check_submission()` - a function that checks your data and metadata and\n  provides quick feedback\n  *  `make_xml_document()` - a function that creates an xml document for \n  submission based on your data and the metadata your provide it\n  \n  `distiller` also contains functions for:\n  \n  * collapsing race and ethnicity values into the CDC's required format\n  * converting month integers to 0-padded character strings\n  * return the proper health outcome identifier for a given content group identifier\n  * Starting from scratch? Most of the mini-functions that make up the two core \n  ones are exposed to the user, so you can check your work in pieces as you make\n  progress with your data wrangling\n\n## `distiller` expectations and scope\n\n`distiller` works for the following content group identifiers:\n\n  -  AS-HOSP\n  -  AS-ED\n  -  CO-HOSP\n  -  CO-ED\n  -  MI-HOSP\n  -  HEAT-HOSP\n  -  HEAT-ED\n  -  COPD-HOSP\n  -  COPD-ED\n  \n  `distiller` expects the following variables in your data:\n  \n  For every content group identifier:\n  \n  -  agegroup\n  -  county\n  -  sex\n  -  ethnicity\n  -  race\n  -  health_outcome_id,\n  -  monthly_count\n  -  month\n  -  year\n\n  For content group identifiers CO-HOSP and CO-ED, the above plus the following:\n  \n  -  fire_count\n  -  nonfire_count\n  -  unknown_count\n  \n## Installation\n\nYou can install the development version of distiller from [GitHub](https://github.com/) with:\n\n``` r\n# install.packages(\"pak\")\npak::pak(\"asenetcky/distiller\")\n```\n\n## Example\n\nHere is a basic example of how to use it:\n\n```{r example}\nlibrary(distiller)\n\n# Take you already-wrangled data\n# note the specific variable names\ndata \u003c-\n  mtcars |\u003e\n  dplyr::rename(\n    month = mpg,\n    agegroup = cyl,\n    county = disp,\n    ethnicity = hp,\n    health_outcome_id = drat,\n    monthly_count = wt,\n    race = qsec,\n    sex = vs,\n    year = am\n  ) |\u003e\n  dplyr::select(-c(gear, carb))\n\n# And your metadata\ncontent_group_id \u003c- \"AS-HOSP\"\nmcn \u003c- \"1234-1234-1234-1234-1234\"\njurisdiction_code \u003c- \"two_letter_code\"\nstate_fips_code \u003c- \"1234\"\nsubmitter_email \u003c- \"submitter@email.com\"\nsubmitter_name \u003c- \"Submitter Name\"\nsubmitter_title \u003c- \"Submitter Title\"\n\n# Optionally check your submission data structure and metadata\ncheck_submission(\n  data,\n  content_group_id,\n  mcn,\n  jurisdiction_code,\n  state_fips_code,\n  submitter_email,\n  submitter_name,\n  submitter_title\n)\n# This can also be checked with `check_first = TRUE` in `make_xml_document()`\n\n\n# And then make your xml document\nmake_xml_document(\n  data,\n  content_group_id,\n  mcn,\n  jurisdiction_code,\n  state_fips_code,\n  submitter_email,\n  submitter_name,\n  submitter_title\n)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasenetcky%2Fdistiller","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fasenetcky%2Fdistiller","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fasenetcky%2Fdistiller/lists"}