https://github.com/tylerlittlefield/openfda-extract

🩺 Convert openFDA device data from JSON files to tables and store in a database
https://github.com/tylerlittlefield/openfda-extract

openfda rstats

Last synced: about 2 months ago
JSON representation

🩺 Convert openFDA device data from JSON files to tables and store in a database

Host: GitHub
URL: https://github.com/tylerlittlefield/openfda-extract
Owner: tylerlittlefield
Created: 2020-09-07T20:19:01.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2021-01-14T05:17:39.000Z (over 4 years ago)
Last Synced: 2025-04-04T01:32:10.349Z (2 months ago)
Topics: openfda, rstats
Language: R
Homepage:
Size: 23.4 KB
Stars: 6
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.Rmd

Awesome Lists containing this project

README

        ---

output: github_document

---

```{r, include = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>"

)

```

# openfda-extract

This repository collects [adverse events data](https://open.fda.gov/apis/device/event/download/) from openFDA. A single [script](/data-raw/loop.R) attempts to:

1. Convert JSON files to tabular format using `jsonlite::fromJSON` and `tidyr::unnest`.

2. Save the data to a database.

The data is relational as described [here](https://opendata.stackexchange.com/a/2187). Converting the data to tabular format may not be efficient and causes lots of duplication. To avoid duplication, I have tried to store nested data in different tables:

1. `adverse_events`

2. `adverse_events.mdr_text`

3. `adverse_events.product_problems`

4. `adverse_events.source_type`

5. `adverse_events.device`

6. `adverse_events.patient`

7. `adverse_events.remedial_action`

8. `adverse_events.type_of_report`

Where the naming convention is `mainframe.`.

## Hardware

1. I transformed the data on:

    * 2013 15" MacBook Pro, 8 GB Memory, 8 Core CPU.

2. I wrote the data to:

    * Postgres database hosted on a digital ocean droplet.

    * 2 GB Memory, 2 vCPUs, 60 GB Disk, Ubuntu 18.04.3 (LTS) x64.

    

For context, this was the result of my first run:

```

~ openFDA database refresh completed in [17.4161201466454 hours]

```

## Examples

If you have successfully ran everything, you should have 8 tables with millions of observations that you can explore.

```{r}

library(dplyr, warn.conflicts = FALSE)

library(DBI)

# credentials

dw <- config::get("datawarehouse")

# connect to db

con <- DBI::dbConnect(

  odbc::odbc(),

  Driver = dw$driver,

  Server = dw$server,

  Database = dw$database,

  UID = dw$uid,

  PWD = dw$pwd,

  Port = dw$port

)

# list all available tables

dbListTables(con)

# query the mdr text

tbl(con, "adverse_events.mdr_text") %>% 

  select(text) %>% 

  head(1)

# query the device information

tbl(con, "adverse_events.device") %>% 

  filter(manufacturer_d_name == "ETHICON, INC.") %>% 

  glimpse()

# disconnect

dbDisconnect(con)

```

Note that there is an `id` column in every table, for example:

* `2014q4-0002-0003-3683`

* `---`

I made this column so that tables can be joined (though this ID in some other form might already exist in the data, I just haven't figured out if that is the case or not).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tylerlittlefield/openfda-extract

Awesome Lists containing this project

README