An open API service indexing awesome lists of open source software.

https://github.com/xiaodaigh/data_manipulation_benchmarks

A set of data manipulation benchmarking code for Julia and R
https://github.com/xiaodaigh/data_manipulation_benchmarks

comparison data-manipulation-prowess julia r

Last synced: about 1 month ago
JSON representation

A set of data manipulation benchmarking code for Julia and R

Awesome Lists containing this project

README

        

# Julia vs R data manipulation benchmark suite
A comparison of data manipulation prowess using synthetic data and the [GE Flight Quest data](https://www.kaggle.com/c/flight/data)

# Set up instructions
1. Change the settings.csv's data_path to a path that you can write to
2. Download the 7z file (https://www.kaggle.com/c/flight/download/InitialTrainingSet_rev1.7z) and
3. Extract it into the folder data_path/InitialTrainingSet_rev1

# Synthetic benchmarks
Adapted from data.tables' [official benchmarks](https://github.com/Rdatatable/data.table/wiki/Benchmarks-:-Grouping#code-to-reproduce-the-timings-above-)

# "Real-life" benchmarks
Uses [GE Flight Quest data](https://www.kaggle.com/c/flight/data), the largest tabular dataset on Kaggle at the time of writing

# Companion post
[Speed of data manipulations in Julia vs R](https://www.codementor.io/zhuojiadai/speed-of-data-manipulation-in-julia-vs-r-cd7praapv)