Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/behrica/csvsplit
Splits and partions arbitrary large CVS files
https://github.com/behrica/csvsplit
Last synced: 27 days ago
JSON representation
Splits and partions arbitrary large CVS files
- Host: GitHub
- URL: https://github.com/behrica/csvsplit
- Owner: behrica
- Created: 2020-10-27T15:30:03.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2020-10-27T21:05:05.000Z (over 4 years ago)
- Last Synced: 2024-11-15T23:35:49.449Z (3 months ago)
- Language: R
- Size: 2.93 KB
- Stars: 1
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# csvsplit
A command line tool to split and partion arbitrary large CSV files.It takes as input a arbitrary large csv file and a list of columns for partioning the data.
It creates then nested directories for all combination of the partioning columns and csv files containing subset of the data.This format is well suited as input for Spark.
It streams the file for input and output, so works with arbitraty large files
It uses internaly the R package readr::read_csv_chunked function, which is very robust and fast.
I has as well a nice progess bar, good for larger files.