https://github.com/colearendt/tidyjson

Tidy your JSON data in R with tidyjson
https://github.com/colearendt/tidyjson

Last synced: 8 months ago
JSON representation

Tidy your JSON data in R with tidyjson

Host: GitHub
URL: https://github.com/colearendt/tidyjson
Owner: colearendt
License: other
Created: 2016-08-26T13:32:27.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2023-01-12T16:37:20.000Z (almost 3 years ago)
Last Synced: 2024-10-11T18:23:18.223Z (about 1 year ago)
Language: R
Size: 3.87 MB
Stars: 182
Watchers: 10
Forks: 14
Open Issues: 31
Metadata Files:
- Readme: README.Rmd
- License: LICENSE

Awesome Lists containing this project

jimsghstars - colearendt/tidyjson - Tidy your JSON data in R with tidyjson (R)

README

          ---

title: 'tidyjson'

output: github_document

---

```{r, echo = FALSE}

knitr::opts_chunk$set(

  collapse = TRUE,

  comment = "#>",

  fig.path = "README-"

)

```

[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidyjson)](https://cran.r-project.org/package=tidyjson)

[![Build Status](https://travis-ci.org/colearendt/tidyjson.svg?branch=master)](https://travis-ci.org/colearendt/tidyjson)

[![Coverage Status](https://codecov.io/github/colearendt/tidyjson/coverage.svg?branch=master)](https://codecov.io/github/colearendt/tidyjson?branch=master)

[![CRAN Activity](http://cranlogs.r-pkg.org/badges/tidyjson)](https://cran.r-project.org/package=tidyjson/index.html)

[![CRAN History](http://cranlogs.r-pkg.org/badges/grand-total/tidyjson)](https://cran.r-project.org/package=tidyjson/index.html)

![tidyjson graphs](https://cloud.githubusercontent.com/assets/2284427/18217882/1b3b2db4-7114-11e6-8ba3-07938f1db9af.png)

tidyjson provides tools for turning complex [json](http://www.json.org/) into [tidy](https://cran.r-project.org/package=tidyr/vignettes/tidy-data.html)

data.

## Installation

Get the released version from CRAN:

```R

install.packages("tidyjson")

```

or the development version from github:

```R

devtools::install_github("colearendt/tidyjson")

```

## Examples

The following example takes a character vector of 

`r library(tidyjson);length(worldbank)` 

documents in the `worldbank` dataset and spreads out all objects.  

Every JSON object key gets its own column with types inferred, so long 

as the key does not represent an array.  When `recursive=TRUE` (the default behavior), 

`spread_all` does this recursively for nested objects and creates column names 

using the `sep` parameter (i.e. `{"a":{"b":1}}` with `sep='.'` would 

generate a single column: `a.b`).

```{r, message=FALSE}

library(dplyr)

library(tidyjson)

worldbank %>% spread_all

```

Some objects in `worldbank` are arrays, which are not handled by `spread_all`.  This example shows how

to quickly summarize the top level structure of a JSON collection

```{r}

worldbank %>% gather_object %>% json_types %>% count(name, type)

```

In order to capture the data in the `majorsector_percent` array, we can use `enter_object` 

to enter into that object, `gather_array` to stack the array and `spread_all`

to capture the object items under the array.

```{r}

worldbank %>%

  enter_object(majorsector_percent) %>%

  gather_array %>%

  spread_all %>%

  select(-document.id, -array.index)

```

## API

### Spreading objects into columns

* `spread_all()` for spreading all object values into new columns, with nested

objects having concatenated names

* `spread_values()` for specifying a subset of object values to spread into new

columns using the `jstring()`, `jinteger()`, `jdouble()` and `jlogical()`

functions.  It is possible to specify multiple parameters to extract data from

nested objects (i.e. `jstring('a','b')`).

### Object navigation

* `enter_object()` for entering into an object by name, discarding all other

JSON (and rows without the corresponding object name) and allowing further 

operations on the object value

* `gather_object()` for stacking all object name-value pairs by name, expanding 

the rows of the `tbl_json` object accordingly

### Array navigation

* `gather_array()` for stacking all array values by index, expanding the

rows of the `tbl_json` object accordingly

### JSON inspection

* `json_types()` for identifying JSON data types

* `json_length()` for computing the length of JSON data (can be larger than

`1` for objects and arrays)

* `json_complexity()` for computing the length of the unnested JSON, i.e.,

how many terminal leaves there are in a complex JSON structure

* `is_json` family of functions for testing the type of JSON data

### JSON summarization

* `json_structure()` for creating a single fixed column data.frame that 

recursively structures arbitrary JSON data

* `json_schema()` for representing the schema of complex JSON, unioned across

disparate JSON documents, and collapsing arrays to their most complex type

representation

### Creating tbl_json objects

* `as.tbl_json()` for converting a string or character vector into a `tbl_json`

object, or for converting a `data.frame` with a JSON column using the

`json.column` argument

* `tbl_json()` for combining a `data.frame` and associated `list` derived

from JSON data into a `tbl_json` object

* `read_json()` for reading JSON data from a file

### Converting tbl_json objects

* `as.character.tbl_json` for converting the JSON attribute of a `tbl_json` 

object back into a JSON character string

### Included JSON data

* `commits`: commit data for the dplyr repo from github API

* `issues`: issue data for the dplyr repo from github API

* `worldbank`: world bank funded projects from jsonstudio

* `companies`: startup company data from jsonstudio

## Philosophy

The goal is to turn complex JSON data, which is often represented as nested

lists, into tidy data frames that can be more easily manipulated.

* Work on a single JSON document, or on a collection of related documents

* Create pipelines with `%>%`, producing code that can be read from left to 

right

* Guarantee the structure of the data produced, even if the input JSON

structure changes (with the exception of `spread_all`)

* Work with arbitrarily nested arrays or objects

* Handle 'ragged' arrays and / or objects (varying lengths by document)

* Allow for extraction of data in values or object names

* Ensure edge cases are handled correctly (especially empty data)

* Integrate seamlessly with `dplyr`, allowing `tbl_json` objects to pipe in and

out of `dplyr` verbs where reasonable

## Related Work

Tidyjson depends upon

* [magrritr](https://github.com/tidyverse/magrittr) for the `%>%` pipe operator

* [jsonlite](https://github.com/jeroen/jsonlite) for converting JSON strings into nested lists

* [purrr](https://github.com/tidyverse/purrr) for list operators

* [tidyr](https://github.com/tidyverse/tidyr) for unnesting and spreading

Further, there are other R packages that can be used to better understand

JSON data

* [listviewer](https://github.com/timelyportfolio/listviewer) for viewing JSON data interactively

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/colearendt/tidyjson

Awesome Lists containing this project

README