Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/TidierOrg/TidierData.jl
Tidier data transformations in Julia, modeled after the dplyr/tidyr R packages.
https://github.com/TidierOrg/TidierData.jl
Last synced: 4 days ago
JSON representation
Tidier data transformations in Julia, modeled after the dplyr/tidyr R packages.
- Host: GitHub
- URL: https://github.com/TidierOrg/TidierData.jl
- Owner: TidierOrg
- License: mit
- Created: 2023-07-28T18:28:01.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-27T03:30:07.000Z (7 months ago)
- Last Synced: 2024-04-27T04:24:48.647Z (7 months ago)
- Language: Julia
- Size: 4.67 MB
- Stars: 64
- Watchers: 5
- Forks: 6
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dataframes - TidierData.jl - 100% Julia implementation of the dplyr and tidyr R packages. (Libraries)
README
# TidierData.jl
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/TidierOrg/TidierData.jl/blob/main/LICENSE)
[![Docs: Latest](https://img.shields.io/badge/Docs-Latest-blue.svg)](https://tidierorg.github.io/TidierData.jl/latest)
[![Build Status](https://github.com/TidierOrg/TidierData.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/TidierOrg/TidierData.jl/actions/workflows/CI.yml?query=branch%3Amain)
[![Downloads](https://img.shields.io/badge/dynamic/json?url=http%3A%2F%2Fjuliapkgstats.com%2Fapi%2Fv1%2Fmonthly_downloads%2FTidierData&query=total_requests&suffix=%2Fmonth&label=Downloads)](http://juliapkgstats.com/pkg/TidierData)## What is TidierData.jl?
TidierData.jl is a 100% Julia implementation of the dplyr and tidyr R packages. Powered by the DataFrames.jl package and Julia’s
extensive meta-programming capabilities, TidierData.jl is an R user’s love
letter to data analysis in Julia.`TidierData.jl` has two goals, which differentiate it from other data analysis
meta-packages in Julia:1. **Stick as closely to dplyr and tidyr syntax as possible:** Whereas other
meta-packages introduce Julia-centric idioms for working with
DataFrames, this package’s goal is to reimplement dplyr and tidyr
in Julia. This means that `TidierData.jl` uses *tidy expressions* as opposed
to idiomatic Julia expressions. An example of a tidy expression is
`a = mean(b)`.2. **Make broadcasting mostly invisible:** Broadcasting trips up many R
users switching to Julia because R users are used to most functions
being vectorized. `TidierData.jl` currently uses a lookup table to decide
which functions *not* to vectorize; all other functions are
automatically vectorized. Read the documentation page on "Autovectorization"
to read about how this works, and how to override the defaults.## Installation
For the stable version:
```
] add TidierData
```The `]` character starts the Julia [package manager](https://docs.julialang.org/en/v1/stdlib/Pkg/). Press the backspace key to return to the Julia prompt.
or
```julia
using Pkg
Pkg.add("TidierData")
```For the newest version:
```
] add TidierData#main
```or
```julia
using Pkg
Pkg.add(url="https://github.com/TidierOrg/TidierData.jl")
```## What functions does TidierData.jl support?
To support R-style programming, TidierData.jl is implemented using macros.
TidierData.jl currently supports the following top-level macros:
- `@glimpse()` and `@head()`
- `@select()` and `@distinct()`
- `@rename()` and `@rename_with()`
- `@mutate()` and `@transmute()`
- `@summarize()` and `@summarise()`
- `@filter()`
- `@slice()`, `@slice_sample()`, `@slice_min()`, `@slice_max()`, `@slice_head()`, and `@slice_tail()`
- `@group_by()` and `@ungroup()`
- `@arrange()`
- `@relocate()`
- `@pull()`
- `@count()` and `@tally()`
- `@left_join()`, `@right_join()`, `@inner_join()`, `@full_join()`, `@anti_join()`, and `@semi_join()`
- `@bind_rows()` and `@bind_cols()`
- `@pivot_wider()` and `@pivot_longer()`
- `@separate()`, `@separate_rows()`, and `@unite()`
- `@drop_missing()` and `@fill_missing()`
- `@unnest_longer()`, `@unnest_wider()`, and `@nest()`
- `@clean_names()` (as in R's `janitor::clean_names()` function)
- `@summary()` (as in R's `summary()` function)TidierData.jl also supports the following helper functions:
- `across()`
- `where()`
- `desc()`
- `if_else()` and `case_when()`
- `n()` and `row_number()`
- `ntile()`
- `lag()` and `lead()`
- `everything()`, `starts_with()`, `ends_with()`, `matches()`, and `contains()`
- `as_float()`, `as_integer()`, and `as_string()`
- `is_number()`, `is_float()`, `is_integer()`, and `is_string()`
- `missing_if()` and `replace_missing()`See the documentation [Home](https://tidierorg.github.io/TidierData.jl/latest/) page for a guide on how to get started, or the [Reference](https://tidierorg.github.io/TidierData.jl/latest/reference/) page for a detailed guide to each of the macros and functions.
## Example
Let's select the first five movies in our dataset whose budget exceeds the mean budget. Unlike in R, where we pass an `na.rm = TRUE` argument to remove missing values, in Julia we wrap the variable with a `skipmissing()` to remove the missing values before the `mean()` is calculated.
```julia
using TidierData
using RDatasetsmovies = dataset("ggplot2", "movies");
@chain movies begin
@mutate(Budget = Budget / 1_000_000)
@filter(Budget >= mean(skipmissing(Budget)))
@select(Title, Budget)
@slice(1:5)
end
``````
5×2 DataFrame
Row │ Title Budget
│ String Float64?
─────┼──────────────────────────────────────
1 │ 'Til There Was You 23.0
2 │ 10 Things I Hate About You 16.0
3 │ 102 Dalmatians 85.0
4 │ 13 Going On 30 37.0
5 │ 13th Warrior, The 85.0
```## What’s new
See [NEWS.md](https://github.com/TidierOrg/TidierData.jl/blob/main/NEWS.md) for the latest updates.
## What's missing
Is there a tidyverse feature missing that you would like to see in TidierData.jl? Please file a GitHub issue. Because TidierData.jl primarily wraps DataFrames.jl, our decision to integrate a new feature will be guided by how well-supported it is within DataFrames.jl and how likely other users are to benefit from it.