Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://pbeshai.github.io/tidy/
Tidy up your data with JavaScript, inspired by dplyr and the tidyverse
https://pbeshai.github.io/tidy/
data dplyr tidyverse wrangling
Last synced: 6 days ago
JSON representation
Tidy up your data with JavaScript, inspired by dplyr and the tidyverse
- Host: GitHub
- URL: https://pbeshai.github.io/tidy/
- Owner: pbeshai
- License: mit
- Created: 2021-02-02T16:22:39.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-05-17T23:51:10.000Z (6 months ago)
- Last Synced: 2024-10-29T15:49:05.724Z (12 days ago)
- Topics: data, dplyr, tidyverse, wrangling
- Language: TypeScript
- Homepage: https://pbeshai.github.io/tidy
- Size: 1.3 MB
- Stars: 738
- Watchers: 15
- Forks: 21
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# tidy.js
[![CircleCI](https://img.shields.io/circleci/build/gh/pbeshai/tidy)](https://app.circleci.com/pipelines/github/pbeshai/tidy)
[![npm](https://img.shields.io/npm/v/@tidyjs/tidy)](https://www.npmjs.com/package/@tidyjs/tidy)**Tidy up your data with JavaScript!** Inspired by [dplyr](https://dplyr.tidyverse.org/) and the [tidyverse](https://www.tidyverse.org/), tidy.js attempts to bring the ergonomics of data manipulation from R to javascript (and typescript). The primary goals of the project are:
* **Readable code**. Tidy.js prioritizes making your data transformations readable, so future you and your teammates can get up and running quickly.
* **Standard transformation verbs**. Tidy.js is built using battle-tested verbs from the R community that can handle any data wrangling need.
* **Work with plain JS objects**. No wrapper classes needed — all tidy.js needs is an array of plain old-fashioned JS objects to get started. Simple in, simple out.
Secondarily, this project aims to provide acceptable types for the functions provided.
#### Quick Links
* [GitHub repo](https://github.com/pbeshai/tidy)
* [Project homepage](https://pbeshai.github.io/tidy)
* [API reference documentation](https://pbeshai.github.io/tidy/docs/api/tidy)
* [Playground](https://pbeshai.github.io/tidy/playground)
* [Observable Intro](https://observablehq.com/@pbeshai/tidy-js-intro-demo)
* [Observable Examples Collection](https://observablehq.com/collection/@pbeshai/tidy-js)
* [GitHub Discussions for Q&A](https://github.com/pbeshai/tidy/discussions)
* [CodeSandbox showing basic HTML usage (UMD)](https://codesandbox.io/s/tidyjs-umd-example-n1g4r?file=/index.html)#### Related work
Be sure to check out a very similar project, [Arquero](https://github.com/uwdata/arquero), from [UW Data](https://idl.cs.washington.edu/).
## Getting started
To start using tidy, your best bet is to install from npm:
```shell
npm install @tidyjs/tidy
# or
yarn add @tidyjs/tidy
```Then import the functions you need:
```js
import { tidy, mutate, arrange, desc } from '@tidyjs/tidy'
```**Note** if you're just trying tidy in a browser, you can use the UMD version hosted on jsdelivr ([codesandbox example](https://codesandbox.io/s/tidyjs-umd-example-n1g4r?file=/index.html)):
```html
const { tidy, mutate, arrange, desc } = Tidy;
// ...
```And use them on an array of objects:
```js
const data = [
{ a: 1, b: 10 },
{ a: 3, b: 12 },
{ a: 2, b: 10 }
]const results = tidy(
data,
mutate({ ab: d => d.a * d.b }),
arrange(desc('ab'))
)
```The output is:
```js
[
{ a: 3, b: 12, ab: 36},
{ a: 2, b: 10, ab: 20},
{ a: 1, b: 10, ab: 10}
]
```All tidy.js code is wrapped in a **tidy flow** via the `tidy()` function. The first argument is the array of data, followed by the transformation verbs to run on the data. The actual functions passed to `tidy()` can be anything so long as they fit the form:
```
(items: object[]) => object[]
```For example, the following is valid:
```js
tidy(
data,
items => items.filter((d, i) => i % 2 === 0),
arrange(desc('value'))
)
```All tidy verbs fit this style, with the exception of exports from groupBy, discussed below.
### Grouping data with groupBy
Besides manipulating flat lists of data, tidy provides facilities for wrangling grouped data via the `groupBy()` function.
```js
import { tidy, summarize, sum, groupBy } from '@tidyjs/tidy'const data = [
{ key: 'group1', value: 10 },
{ key: 'group2', value: 9 },
{ key: 'group1', value: 7 }
]tidy(
data,
groupBy('key', [
summarize({ total: sum('value') })
])
)```
The output is:
```js
[
{ "key": "group1", "total": 17 },
{ "key": "group2", "total": 9 },
]
```The `groupBy()` function works similarly to `tidy()` in that it takes a flow of functions as its second argument (wrapped in an array). Things get really fun when you use groupBy's *third* argument for exporting the grouped data into different shapes.
For example, exporting data as a nested object, we can use `groupBy.object()` as the third argument to `groupBy()`.
```js
const data = [
{ g: 'a', h: 'x', value: 5 },
{ g: 'a', h: 'y', value: 15 },
{ g: 'b', h: 'x', value: 10 },
{ g: 'b', h: 'x', value: 20 },
{ g: 'b', h: 'y', value: 30 },
]tidy(
data,
groupBy(
['g', 'h'],
[
mutate({ key: d => `\${d.g}\${d.h}`})
],
groupBy.object() // <-- specify the export
)
);```
The output is:
```js
{
"a": {
"x": [{"g": "a", "h": "x", "value": 5, "key": "ax"}],
"y": [{"g": "a", "h": "y", "value": 15, "key": "ay"}]
},
"b": {
"x": [
{"g": "b", "h": "x", "value": 10, "key": "bx"},
{"g": "b", "h": "x", "value": 20, "key": "bx"}
],
"y": [{"g": "b", "h": "y", "value": 30, "key": "by"}]
}
}
```Or alternatively as `{ key, values }` entries-objects via `groupBy.entriesObject()`:
```js
tidy(data,
groupBy(
['g', 'h'],
[
mutate({ key: d => `\${d.g}\${d.h}`})
],
groupBy.entriesObject() // <-- specify the export
)
);
```The output is:
```js
[
{
"key": "a",
"values": [
{"key": "x", "values": [{"g": "a", "h": "x", "value": 5, "key": "ax"}]},
{"key": "y", "values": [{"g": "a", "h": "y", "value": 15, "key": "ay"}]}
]
},
{
"key": "b",
"values": [
{
"key": "x",
"values": [
{"g": "b", "h": "x", "value": 10, "key": "bx"},
{"g": "b", "h": "x", "value": 20, "key": "bx"}
]
},
{"key": "y", "values": [{"g": "b", "h": "y", "value": 30, "key": "by"}]}
]
}
]
```It's common to be left with a single leaf in a groupBy set, especially after running summarize(). To prevent your exported data having its values wrapped in an array, you can pass the `single` option to it.
```js
tidy(input,
groupBy(['g', 'h'], [
summarize({ total: sum('value') })
], groupBy.object({ single: true }))
);
```The output is:
```js
{
"a": {
"x": {"total": 5, "g": "a", "h": "x"},
"y": {"total": 15, "g": "a", "h": "y"}
},
"b": {
"x": {"total": 30, "g": "b", "h": "x"},
"y": {"total": 30, "g": "b", "h": "y"}
}
}
```Visit the [API reference docs](https://pbeshai.github.io/tidy/docs/api/tidy) to learn more about how each function works and all the options they take. Be sure to check out the `levels` export, which can let you mix-and-match different export types based on the depth of the data. For quick reference, other available groupBy exports include:
* groupBy.entries()
* groupBy.entriesObject()
* groupBy.grouped()
* groupBy.levels()
* groupBy.object()
* groupBy.keys()
* groupBy.map()
* groupBy.values()---
## Developing
clone the repo:
```
git clone [email protected]:pbeshai/tidy.git
```install dependencies:
```
yarn
```initialize lerna:
```
lerna bootstrap
```build tidy:
```
yarn run build
```test all of tidy:
```
yarn run test
```test:watch a single package
```
yarn workspace @tidyjs/tidy test:watch
```### Conventional commits
This library uses [conventional commits](https://www.conventionalcommits.org/), following the angular convention. Prefixes are:
- **build**: Changes that affect the build system or external dependencies (example scopes: yarn, npm)
- **ci**: Changes to our CI configuration files and scripts (e.g. CircleCI)
- **chore**
- **docs**: Documentation only changes
- **feat** : A new feature
- **fix**: A bug fix
- **perf**: A code change that improves performance
- **refactor**: A code change that neither fixes a bug nor adds a feature
- **revert**
- **style**: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
- **test**: Adding missing tests or correcting existing tests### Docs website
start the local site:
```
yarn start:web
```build the site:
```
yarn build:web
```deploy the site via github-pages:
```
USE_SSH=true GIT_USER=pbeshai yarn workspace @tidyjs/tidy-website deploy
```Ideally we can automate this via github actions one day!
---
#### Shout out to Netflix
I want to give a big shout out to [Netflix](https://research.netflix.com/), my current employer, for giving me the opportunity to work on this project and to open source it. It's a great place to work and if you enjoy tinkering with data-related things, I'd strongly recommend checking out [our analytics department](https://research.netflix.com/research-area/analytics).
– [Peter Beshai](https://peterbeshai.com/)