https://github.com/juliaaplavin/datapipes.jl
The most convenient piping syntax for generic data manipulation in Julia.
https://github.com/juliaaplavin/datapipes.jl
data-manipulation macro
Last synced: about 1 year ago
JSON representation
The most convenient piping syntax for generic data manipulation in Julia.
- Host: GitHub
- URL: https://github.com/juliaaplavin/datapipes.jl
- Owner: JuliaAPlavin
- License: mit
- Created: 2024-02-18T17:56:15.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2025-03-18T16:58:40.000Z (about 1 year ago)
- Last Synced: 2025-04-17T07:16:49.921Z (about 1 year ago)
- Topics: data-manipulation, macro
- Language: Julia
- Homepage:
- Size: 1.1 MB
- Stars: 10
- Watchers: 1
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DataPipes.jl <|>
Function piping with the focus on making general data processing boilerplate-free.
 `DataPipes.jl` is extensively tested with full coverage and more test lines than the actual code.
_Questions other than direct bug reports are best asked in the [](https://discourse.julialang.org/t/ann-datapipes-jl/60734)._
# Design
 There are multiple implementation of the piping concept in Julia: [1](https://github.com/c42f/Underscores.jl), [2](https://github.com/jkrumbiegel/Chain.jl), [3](https://github.com/FNj/Hose.jl), [4](https://github.com/oxinabox/Pipe.jl), maybe even more. `DataPipes` design is focused on usual data processing and analysis tasks. What makes `DataPipes` distinct from other packages is that it ticks all these points:
✅ Gets rid of basically all boilerplate for common data processing functions:
```julia
@p tbl |> filter(_.a > 5) |> map(_.b + _.c)
```
✅ Can be inserted in as a step of a vanilla Julia pipeline without modifying the latter:
```julia
tbl |> sum # before
tbl |> @f(map(_ ^ 2) |> filter(_ > 5)) |> sum # after
```
✅ Can define a function transforming the data instead of immediately applying it
```julia
func = @f map(_ ^ 2) |> filter(_ > 5) |> sum # define func
func(tbl) # apply it
```
✅ Supports easily exporting the result of an intermediate pipeline step
```julia
@p let
tbl
@export tbl_filt = filter(_.a > 5) # export a single intermediate result
map(_.b + _.c)
end
@p begin # use begin instead of let to make all intermediate results available afterwards
tbl
tbl_filt = filter(_.a > 5)
map(_.b + _.c)
end
# tbl_filt is available here
```
✅ Provides no-boilerplate nesting
```julia
@p let
"a=1 b=2 c=3"
split
map() do __ # `__` turns the inner function into a pipeline
split(__, '=')
Symbol(__[1]) => parse(Int, __[2])
end
NamedTuple
end # == (a = 1, b = 2, c = 3)
```
As demonstrated, `DataPipes` tries to minimally modify regular Julia syntax and stays fully composable both with other instruments _(vanilla pipelines)_ and with itself _(nested pipes)_.
# Examples
Those design decisions make `DataPipes` convenient for both working with flat tabular data, and for processing nested structures. An example of the former:
```julia
@p begin
tbl
filter(!any(ismissing, _))
filter(_.id > 6)
groupview(_.group)
map(sum(_.age))
end
```
_(adapted from the Chain.jl README; all DataFrames-specific operations replaced with general functions)_
 See [the Pluto notebook](https://aplavin.github.io/DataPipes.jl/examples/notebook.html) for more examples and more extensive `DataPipes` syntax description.