Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/santisq/pscsvutil
tiny module to split and merge large Csv files
https://github.com/santisq/pscsvutil
csv merge powershell split streamreader streamwriter
Last synced: 1 day ago
JSON representation
tiny module to split and merge large Csv files
- Host: GitHub
- URL: https://github.com/santisq/pscsvutil
- Owner: santisq
- License: mit
- Created: 2022-10-16T23:35:06.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-05-21T17:55:34.000Z (over 1 year ago)
- Last Synced: 2023-05-21T18:24:42.698Z (over 1 year ago)
- Topics: csv, merge, powershell, split, streamreader, streamwriter
- Language: PowerShell
- Homepage: https://www.powershellgallery.com/packages/PSCsvUtil
- Size: 11.7 KB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
PSCsvUtil
PSCsvUtil is a tiny PowerShell Module composed of two functions to efficiently __split and merge__ big Csv files using [`StreamReader`](https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader.-ctor?view=net-7.0) and [`StreamWriter`](https://learn.microsoft.com/en-us/dotnet/api/system.io.streamwriter?view=net-7.0) .NET Classes.
## Installation
```powershell
Install-Module PSCsvUtil -Scope CurrentUser
```## Compatibility
Tested and compatible with Windows PowerShell 5.1 and [PowerShell Core 7+](https://github.com/PowerShell/PowerShell).
## Examples
### Split a Csv into 5 chunks
This example demonstrates how to split a Csv using the `-Chunks` parameter from [`Split-Csv`](/public/Split-Csv.ps1).
```powershell
PS ..\PS-CsvUtil> gpstree .\tests\Mode Hierarchy Size
---- --------- ----
d---- tests 1.78 GB
-a--- └── bigcsv.csv 1.78 GBPS ..\PS-CsvUtil> Split-Csv .\tests\bigcsv.csv -DestinationFolder .\tests\by5chunks -PassThru -Chunks 5
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/16/2022 8:12 PM 382955825 bigcsv - Part 0.csv
-a--- 10/16/2022 8:12 PM 382955849 bigcsv - Part 1.csv
-a--- 10/16/2022 8:13 PM 382955817 bigcsv - Part 2.csv
-a--- 10/16/2022 8:13 PM 382955855 bigcsv - Part 3.csv
-a--- 10/16/2022 8:14 PM 382954454 bigcsv - Part 4.csvPS ..\PS-CsvUtil> gpstree .\tests\
Mode Hierarchy Size
---- --------- ----
d---- tests 1.78 GB
-a--- ├── bigcsv.csv 1.78 GB
d---- └── by5chunks 1.78 GB
-a--- ├── bigcsv - Part 0.csv 365.22 MB
-a--- ├── bigcsv - Part 1.csv 365.22 MB
-a--- ├── bigcsv - Part 2.csv 365.22 MB
-a--- ├── bigcsv - Part 3.csv 365.22 MB
-a--- └── bigcsv - Part 4.csv 365.21 MBPS ..\PS-CsvUtil> Import-Csv '.\tests\by5chunks\bigcsv - Part 3.csv' | Select-Object -First 3 | FT -Auto
Column0 Column1 Column2 Column3 Column4 Column5
------- ------- ------- ------- ------- -------
2051598092 444770706 545010177 699773973 933288705 2135983609
649944517 1548086840 1011873614 1927411868 1902850446 240759930
182136426 1642338384 983474548 38656729 167556228 569658758
```### Split a Csv into chunks of 250Mb
This example demonstrates how to split a Csv using the `-Size` parameter from [`Split-Csv`](/public/Split-Csv.ps1).
```powershell
PS ..\PS-CsvUtil> Split-Csv .\tests\bigcsv.csv -DestinationFolder .\tests\by50mbchunks -Size 50mb
PS ..\PS-CsvUtil> gpstree .\tests\by50mbchunks\Mode Hierarchy Size
---- --------- ----
d---- by50mbchunks 1.78 GB
-a--- ├── bigcsv - Part 00.csv 50 MB
-a--- ├── bigcsv - Part 01.csv 50 MB
-a--- ├── bigcsv - Part 02.csv 50 MB
-a--- ├── bigcsv - Part 03.csv 50 MB
....
....
....
-a--- ├── bigcsv - Part 33.csv 50 MB
-a--- ├── bigcsv - Part 34.csv 50 MB
-a--- ├── bigcsv - Part 35.csv 50 MB
-a--- └── bigcsv - Part 36.csv 26.08 MBPS ..\PS-CsvUtil> Import-Csv '.\tests\by50mbchunks\bigcsv - Part 28.csv' | Select-Object -First 3 | FT -Auto
Column0 Column1 Column2 Column3 Column4 Column5
------- ------- ------- ------- ------- -------
1241863732 224211646 1035291147 1555398003 1777828960 416276892
1148375056 433051937 1367055122 556502602 38341193 113835498
2056137503 825665841 1921526568 2102926379 1093669865 2030084321
```### Merging a split Csv
This example demonstrates how to merge the Csv files generated with the examples above using [`Merge-Csv`](/public/Merge-Csv.ps1).
```powershell
PS ..\PS-CsvUtil> Get-ChildItem .\tests\by50mbchunks\ -Filter *Part*.csv | Merge-Csv -DestinationPath mergedCsv50mbchunks.csv -PassThruDirectory: ..\pwsh\PS-CsvUtil
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a--- 10/16/2022 8:32 PM 1914777556 mergedCsv50mbchunks.csvPS ..\PS-CsvUtil> (Get-FileHash .\mergedCsv50mbchunks.csv -Algorithm MD5).Hash
0452ACEC1664A93137C41D131FA4C3A9PS ..\PS-CsvUtil> (Get-FileHash .\tests\bigcsv.csv -Algorithm MD5).Hash
0452ACEC1664A93137C41D131FA4C3A9
```