https://github.com/mhairi/medicalclaims
Medical Claims Data R Package
https://github.com/mhairi/medicalclaims
Last synced: 3 months ago
JSON representation
Medical Claims Data R Package
- Host: GitHub
- URL: https://github.com/mhairi/medicalclaims
- Owner: mhairi
- Created: 2020-05-12T16:06:24.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-05-19T11:44:53.000Z (almost 5 years ago)
- Last Synced: 2024-08-13T07:13:03.933Z (6 months ago)
- Language: R
- Size: 9.2 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.Rmd
Awesome Lists containing this project
- jimsghstars - mhairi/medicalclaims - Medical Claims Data R Package (R)
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# medicalclaimsA data package with a sample of 100,000 anonymised medical claims New Hampshire’s Comprehensive Health Information System (https://nhchis.com/).
## Installation
You can install though GitHub with:
``` r
# install.packages("devtools")
devtools::install_github("mhairi/medicalclaims")
```
## ExampleOnce you've loaded the package, the data is in an object called `claims`. The data frame has 100,000 rows and 57 variables.
```{r}
library(medicalclaims)
head(claims)
```Here is how you find the procedures with the highest average cost, only counting procedures that have appeared at least 10 times in the data.
```{r, message=FALSE, warning=FALSE}
library(tidyverse)claims %>%
group_by(cpt_desc) %>%
summarise(
avg_cost = mean(total_by_n),
n = n()
) %>%
filter(n > 10) %>%
arrange(desc(avg_cost)) %>%
top_n(10, avg_cost)
```If you want to look at how expensive different diagnoses are, then you first need to summarise over `imputed_service_key` and `icd_diag_01_primary`. This gives us the total spending for each patient and each diagnosis.
```{r}
by_individual <-
claims %>%
group_by(new_diag_desc, imputed_service_key) %>%
summarise(spending = sum(total)) %>%
ungroup
```Then we can summarise to find the most expensive diagnoses.
```{r}
by_individual %>%
group_by(new_diag_desc) %>%
summarise(
avg_cost = mean(spending),
n = n()
) %>%
filter(n > 10) %>%
arrange(desc(avg_cost)) %>%
top_n(10, avg_cost)
```