Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/colearendt/tidyjson
Tidy your JSON data in R with tidyjson
https://github.com/colearendt/tidyjson
Last synced: 9 days ago
JSON representation
Tidy your JSON data in R with tidyjson
- Host: GitHub
- URL: https://github.com/colearendt/tidyjson
- Owner: colearendt
- License: other
- Created: 2016-08-26T13:32:27.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2023-01-12T16:37:20.000Z (almost 2 years ago)
- Last Synced: 2024-10-11T18:23:18.223Z (30 days ago)
- Language: R
- Size: 3.87 MB
- Stars: 182
- Watchers: 10
- Forks: 14
- Open Issues: 31
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
- jimsghstars - colearendt/tidyjson - Tidy your JSON data in R with tidyjson (R)
README
---
title: 'tidyjson'
output: github_document
---```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```[![CRAN_Status_Badge](http://www.r-pkg.org/badges/version/tidyjson)](https://cran.r-project.org/package=tidyjson)
[![Build Status](https://travis-ci.org/colearendt/tidyjson.svg?branch=master)](https://travis-ci.org/colearendt/tidyjson)
[![Coverage Status](https://codecov.io/github/colearendt/tidyjson/coverage.svg?branch=master)](https://codecov.io/github/colearendt/tidyjson?branch=master)[![CRAN Activity](http://cranlogs.r-pkg.org/badges/tidyjson)](https://cran.r-project.org/package=tidyjson/index.html)
[![CRAN History](http://cranlogs.r-pkg.org/badges/grand-total/tidyjson)](https://cran.r-project.org/package=tidyjson/index.html)![tidyjson graphs](https://cloud.githubusercontent.com/assets/2284427/18217882/1b3b2db4-7114-11e6-8ba3-07938f1db9af.png)
tidyjson provides tools for turning complex [json](http://www.json.org/) into [tidy](https://cran.r-project.org/package=tidyr/vignettes/tidy-data.html)
data.## Installation
Get the released version from CRAN:
```R
install.packages("tidyjson")
```or the development version from github:
```R
devtools::install_github("colearendt/tidyjson")
```## Examples
The following example takes a character vector of
`r library(tidyjson);length(worldbank)`
documents in the `worldbank` dataset and spreads out all objects.
Every JSON object key gets its own column with types inferred, so long
as the key does not represent an array. When `recursive=TRUE` (the default behavior),
`spread_all` does this recursively for nested objects and creates column names
using the `sep` parameter (i.e. `{"a":{"b":1}}` with `sep='.'` would
generate a single column: `a.b`).```{r, message=FALSE}
library(dplyr)
library(tidyjson)worldbank %>% spread_all
```Some objects in `worldbank` are arrays, which are not handled by `spread_all`. This example shows how
to quickly summarize the top level structure of a JSON collection```{r}
worldbank %>% gather_object %>% json_types %>% count(name, type)
```In order to capture the data in the `majorsector_percent` array, we can use `enter_object`
to enter into that object, `gather_array` to stack the array and `spread_all`
to capture the object items under the array.```{r}
worldbank %>%
enter_object(majorsector_percent) %>%
gather_array %>%
spread_all %>%
select(-document.id, -array.index)
```## API
### Spreading objects into columns
* `spread_all()` for spreading all object values into new columns, with nested
objects having concatenated names* `spread_values()` for specifying a subset of object values to spread into new
columns using the `jstring()`, `jinteger()`, `jdouble()` and `jlogical()`
functions. It is possible to specify multiple parameters to extract data from
nested objects (i.e. `jstring('a','b')`).### Object navigation
* `enter_object()` for entering into an object by name, discarding all other
JSON (and rows without the corresponding object name) and allowing further
operations on the object value* `gather_object()` for stacking all object name-value pairs by name, expanding
the rows of the `tbl_json` object accordingly### Array navigation
* `gather_array()` for stacking all array values by index, expanding the
rows of the `tbl_json` object accordingly### JSON inspection
* `json_types()` for identifying JSON data types
* `json_length()` for computing the length of JSON data (can be larger than
`1` for objects and arrays)* `json_complexity()` for computing the length of the unnested JSON, i.e.,
how many terminal leaves there are in a complex JSON structure* `is_json` family of functions for testing the type of JSON data
### JSON summarization
* `json_structure()` for creating a single fixed column data.frame that
recursively structures arbitrary JSON data* `json_schema()` for representing the schema of complex JSON, unioned across
disparate JSON documents, and collapsing arrays to their most complex type
representation### Creating tbl_json objects
* `as.tbl_json()` for converting a string or character vector into a `tbl_json`
object, or for converting a `data.frame` with a JSON column using the
`json.column` argument* `tbl_json()` for combining a `data.frame` and associated `list` derived
from JSON data into a `tbl_json` object* `read_json()` for reading JSON data from a file
### Converting tbl_json objects
* `as.character.tbl_json` for converting the JSON attribute of a `tbl_json`
object back into a JSON character string### Included JSON data
* `commits`: commit data for the dplyr repo from github API
* `issues`: issue data for the dplyr repo from github API
* `worldbank`: world bank funded projects from jsonstudio
* `companies`: startup company data from jsonstudio
## Philosophy
The goal is to turn complex JSON data, which is often represented as nested
lists, into tidy data frames that can be more easily manipulated.* Work on a single JSON document, or on a collection of related documents
* Create pipelines with `%>%`, producing code that can be read from left to
right* Guarantee the structure of the data produced, even if the input JSON
structure changes (with the exception of `spread_all`)* Work with arbitrarily nested arrays or objects
* Handle 'ragged' arrays and / or objects (varying lengths by document)
* Allow for extraction of data in values or object names
* Ensure edge cases are handled correctly (especially empty data)
* Integrate seamlessly with `dplyr`, allowing `tbl_json` objects to pipe in and
out of `dplyr` verbs where reasonable## Related Work
Tidyjson depends upon
* [magrritr](https://github.com/tidyverse/magrittr) for the `%>%` pipe operator
* [jsonlite](https://github.com/jeroen/jsonlite) for converting JSON strings into nested lists
* [purrr](https://github.com/tidyverse/purrr) for list operators
* [tidyr](https://github.com/tidyverse/tidyr) for unnesting and spreadingFurther, there are other R packages that can be used to better understand
JSON data* [listviewer](https://github.com/timelyportfolio/listviewer) for viewing JSON data interactively