An open API service indexing awesome lists of open source software.

https://github.com/xiaodaigh/databench.jl

A package to benchmark data manipulation in Julia vs R data.table
https://github.com/xiaodaigh/databench.jl

Last synced: 2 months ago
JSON representation

A package to benchmark data manipulation in Julia vs R data.table

Awesome Lists containing this project

README

        

# DataBench - a Julia vs R data manipulation benchmark suite
A comparison of data manipulation prowess using synthetic data and the [GE Flight Quest data](https://www.kaggle.com/c/flight/data)

# Set up instructions
```julia
# Pkg.add("DataBench")
```

1. Change the settings.csv's data_path to a path that you can write to
2. Download the 7z file (https://www.kaggle.com/c/flight/download/InitialTrainingSet_rev1.7z) and
3. Extract it into the folder data_path/InitialTrainingSet_rev1

# Synthetic benchmarks
Adapted from data.tables' [official benchmarks](https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping#code-to-reproduce-the-timings-above-)

# "Real-life" benchmarks
Uses [GE Flight Quest data](https://www.kaggle.com/c/flight/data), the largest tabular dataset on Kaggle at the time of writing

# Companion post
[Speed of data manipulations in Julia vs R](https://www.codementor.io/zhuojiadai/speed-of-data-manipulation-in-julia-vs-r-cd7praapv)

# Similar repos
https://github.com/szilard/benchm-databases