https://github.com/mattpopovich/dataframeioperformancetesting
Tests the speed and file size of reading and writing DataFrames to/from disk with different file and compression types
https://github.com/mattpopovich/dataframeioperformancetesting
csv csv-format feather file-io hdf5 hdf5-format orc orc-format pandas pandas-dataframe pandas-python parquet parquet-files parquet-format pickle pickle-file python python3
Last synced: 5 months ago
JSON representation
Tests the speed and file size of reading and writing DataFrames to/from disk with different file and compression types
- Host: GitHub
- URL: https://github.com/mattpopovich/dataframeioperformancetesting
- Owner: mattpopovich
- Created: 2025-06-10T20:03:09.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-07-24T21:46:45.000Z (7 months ago)
- Last Synced: 2025-07-25T00:55:29.233Z (7 months ago)
- Topics: csv, csv-format, feather, file-io, hdf5, hdf5-format, orc, orc-format, pandas, pandas-dataframe, pandas-python, parquet, parquet-files, parquet-format, pickle, pickle-file, python, python3
- Language: Python
- Homepage:
- Size: 175 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DataFrame I/O Performance Testing
[](https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/mattpopovich/DataFrameIOPerformanceTesting)
This repo was created to test the speed of reading and writing DataFrames to/from disk using different file and compression types.
## Target Audience
You have data. You read it, store it, and use it with Python/Pandas `DataFrame`s. You want to know what is the "best" way to do this.
## How This Repo Works
This repo will load data into a Pandas `DataFrame`. It will then write and read the `DataFrame` through all the many options that Python/Pandas provides. Lastly, it will give you a summary of the time it took to write to file, read from file, and the filesize.
You may need to modify the code such that your input file is read in correctly. My sample file, `data.csv` is time series data with the first column specifying the date and time.
## How to Use
Run the following file to perform analysis: `python3 analyze_dataframe_io.py`. A `Dockerfile` is provided to manage the repo's requirements.
The following arguments are supported:
- `-f` or `--file`
- Specify the file used for input
- `-k` or `--keep`
- Do not delete the generated files used for analysis in `/outputs`
- `-v` or `--verbose`
- For every compression type that supports it, test different levels of compression
- Show some additional columns
- For every compression type that supports it (except for `csv`), use different compression engines
- 179 tests: 1.25min on a 7.4MB `.csv` file with a M1 Mac
- `-vv` or `--very-verbose`
- Use all available parsing engines for `csv` files
- 331 tests: 3min on a 7.4MB `.csv` file with a M1 Mac
Example output of `analyze_dataframe_io.py` with a 7.4MB `.csv` file (21 tests: 5s on the 7.4MB `.csv` file with a M1 Mac):
```
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ ┃ Write ┃ Read ┃ ┃ ┃ ┃ ┃ ┃
┃ ┃ time ┃ time ┃ ┃ Output ┃ ┃ ┃ Score ┃
┃ ┃ to ┃ from ┃ Total ┃ File ┃ ┃ Output ┃ (lower ┃
┃ ┃ file ┃ file ┃ I/O ┃ Size (% ┃ Total I/O ┃ File Size ┃ is ┃
┃ Format ┃ (s) ┃ (s) ┃ (s) ┃ Orig.) ┃ Normalized ┃ Normalized ┃ better) ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ pkl.zst │ 0.0104 │ 0.0056 │ 0.016 │ 6.26 │ 1.6066 │ 1.6949 │ 3.3015 │
│ parquet.zstd │ 0.0198 │ 0.0054 │ 0.0252 │ 11.7599 │ 2.5312 │ 3.1841 │ 5.7152 │
│ parquet.snappy │ 0.0198 │ 0.0046 │ 0.0244 │ 12.3294 │ 2.4514 │ 3.3383 │ 5.7897 │
│ parquet.lz4 │ 0.02 │ 0.005 │ 0.025 │ 12.3539 │ 2.5157 │ 3.3449 │ 5.8606 │
│ feather │ 0.0127 │ 0.0048 │ 0.0174 │ 15.8187 │ 1.7507 │ 4.283 │ 6.0337 │
│ pkl.zip │ 0.0586 │ 0.014 │ 0.0725 │ 6.2169 │ 7.287 │ 1.6833 │ 8.9703 │
│ parquet │ 0.0364 │ 0.0103 │ 0.0467 │ 15.8355 │ 4.6879 │ 4.2875 │ 8.9754 │
│ parquet.gzip │ 0.0664 │ 0.0068 │ 0.0732 │ 8.9559 │ 7.3534 │ 2.4248 │ 9.7782 │
│ parquet.brotli │ 0.0744 │ 0.0066 │ 0.081 │ 8.4436 │ 8.1363 │ 2.2862 │ 10.4225 │
│ pkl.gzip │ 0.0056 │ 0.0044 │ 0.01 │ 50.4742 │ 1 │ 13.6662 │ 14.6662 │
│ pkl │ 0.0064 │ 0.0042 │ 0.0106 │ 50.4742 │ 1.0665 │ 13.6662 │ 14.7326 │
│ orc │ 0.0389 │ 0.0147 │ 0.0536 │ 49.0572 │ 5.3825 │ 13.2825 │ 18.665 │
│ h5 │ 0.0352 │ 0.0125 │ 0.0477 │ 74.4187 │ 4.7886 │ 20.1493 │ 24.9378 │
│ csv.zst │ 0.2704 │ 0.0599 │ 0.3302 │ 6.89 │ 33.18 │ 1.8655 │ 35.0456 │
│ csv.zip │ 0.3121 │ 0.0674 │ 0.3794 │ 6.5601 │ 38.121 │ 1.7762 │ 39.8972 │
│ pkl.xz │ 0.4767 │ 0.0302 │ 0.5069 │ 4.892 │ 50.9298 │ 1.3245 │ 52.2543 │
│ csv.gz │ 0.4503 │ 0.0662 │ 0.5166 │ 5.9552 │ 51.9012 │ 1.6124 │ 53.5136 │
│ csv │ 0.2972 │ 0.0573 │ 0.3546 │ 69.7714 │ 35.6222 │ 18.891 │ 54.5131 │
│ pkl.bz2 │ 0.5928 │ 0.0498 │ 0.6426 │ 7.5608 │ 64.5651 │ 2.0471 │ 66.6122 │
│ csv.bz2 │ 0.557 │ 0.1083 │ 0.6653 │ 5.0619 │ 66.8476 │ 1.3705 │ 68.2181 │
│ csv.xz │ 2.1117 │ 0.0861 │ 2.1978 │ 3.6934 │ 220.8097 │ 1 │ 221.8097 │
└─────────────────┴────────┴────────┴────────┴─────────┴────────────┴────────────┴──────────┘
```
Click to show the -v or --verbose output for the 7.4MB .csv file
```
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ ┃ ┃ Write ┃ Read ┃ ┃ ┃ ┃ ┃ ┃ ┃
┃ ┃ DataFrame ┃ time ┃ time ┃ ┃ Output ┃ ┃ ┃ ┃ Score ┃
┃ ┃ Memory ┃ to ┃ from ┃ Total ┃ File ┃ ┃ ┃ Output ┃ (lower ┃
┃ ┃ Difference ┃ file ┃ file ┃ I/O ┃ Size (% ┃ Equivalent ┃ Total I/O ┃ File Size ┃ is ┃
┃ Format ┃ (B) ┃ (s) ┃ (s) ┃ (s) ┃ Orig.) ┃ DataFrames ┃ Normalized ┃ Normalized ┃ better) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ pkl.zst C=0 │ 0 │ 0.0105 │ 0.0054 │ 0.0159 │ 6.26 │ True │ 1.4008 │ 1.6949 │ 3.0957 │
│ pkl.zst │ 0 │ 0.0101 │ 0.0059 │ 0.016 │ 6.26 │ True │ 1.4077 │ 1.6949 │ 3.1026 │
│ pkl.zst C=3 │ 0 │ 0.0104 │ 0.0057 │ 0.0161 │ 6.26 │ True │ 1.4154 │ 1.6949 │ 3.1104 │
│ pkl.zst C=4 │ 0 │ 0.0111 │ 0.0061 │ 0.0172 │ 6.2634 │ True │ 1.5131 │ 1.6959 │ 3.209 │
│ pkl.zst C=1 │ 0 │ 0.0087 │ 0.0058 │ 0.0145 │ 7.2059 │ True │ 1.2722 │ 1.951 │ 3.2233 │
│ pkl.zst C=2 │ 0 │ 0.009 │ 0.0063 │ 0.0154 │ 7.3869 │ True │ 1.351 │ 2 │ 3.3511 │
│ pkl.zst C=-2 │ 0 │ 0.0078 │ 0.0053 │ 0.0131 │ 8.4334 │ True │ 1.1511 │ 2.2834 │ 3.4345 │
│ pkl.zst C=-1 │ 0 │ 0.008 │ 0.006 │ 0.014 │ 8.4571 │ True │ 1.2309 │ 2.2898 │ 3.5207 │
│ pkl.zst C=5 │ 0 │ 0.0184 │ 0.0053 │ 0.0236 │ 6.1122 │ True │ 2.0776 │ 1.6549 │ 3.7325 │
│ pkl.zst C=6 │ 0 │ 0.0214 │ 0.0051 │ 0.0265 │ 6.0101 │ True │ 2.3249 │ 1.6273 │ 3.9522 │
│ pkl.zst C=-4 │ 0 │ 0.0076 │ 0.0057 │ 0.0132 │ 10.4564 │ True │ 1.1625 │ 2.8311 │ 3.9936 │
│ pkl.zst C=-3 │ 0 │ 0.0081 │ 0.0056 │ 0.0137 │ 10.47 │ True │ 1.202 │ 2.8348 │ 4.0368 │
│ pkl.zst C=7 │ 0 │ 0.0234 │ 0.005 │ 0.0284 │ 5.9834 │ True │ 2.4957 │ 1.62 │ 4.1157 │
│ parquet.zstd WfastP Rpy→ │ 0 │ 0.025 │ 0.005 │ 0.03 │ 6.2202 │ True │ 2.6401 │ 1.6841 │ 4.3242 │
│ pkl.zst C=-5 │ 0 │ 0.008 │ 0.0061 │ 0.0141 │ 12.0002 │ True │ 1.2389 │ 3.2491 │ 4.488 │
│ pkl.zst C=9 │ 0 │ 0.0276 │ 0.0054 │ 0.0331 │ 5.9113 │ True │ 2.9066 │ 1.6005 │ 4.5071 │
│ pkl.zst C=-6 │ 0 │ 0.0078 │ 0.005 │ 0.0127 │ 12.6744 │ True │ 1.1194 │ 3.4317 │ 4.5511 │
│ pkl.zst C=8 │ 0 │ 0.0289 │ 0.0052 │ 0.0341 │ 5.9119 │ True │ 2.9957 │ 1.6007 │ 4.5963 │
│ parquet.zstd WfastP RfastP │ 0 │ 0.0241 │ 0.0103 │ 0.0343 │ 6.2202 │ True │ 3.0179 │ 1.6841 │ 4.7021 │
│ pkl.gzip C=1 │ 0 │ 0.0211 │ 0.0114 │ 0.0325 │ 7.0612 │ True │ 2.8567 │ 1.9119 │ 4.7686 │
│ pkl.gzip C=2 │ 0 │ 0.0234 │ 0.012 │ 0.0355 │ 6.8926 │ True │ 3.1163 │ 1.8662 │ 4.9825 │
│ pkl.zip C=1 │ 0 │ 0.0216 │ 0.0135 │ 0.0351 │ 7.0624 │ True │ 3.0808 │ 1.9122 │ 4.993 │
│ pkl.zst C=-7 │ 0 │ 0.0076 │ 0.0062 │ 0.0138 │ 13.9512 │ True │ 1.2162 │ 3.7774 │ 4.9936 │
│ pkl.gzip C=3 │ 0 │ 0.0251 │ 0.0111 │ 0.0362 │ 6.7933 │ True │ 3.1813 │ 1.8393 │ 5.0206 │
│ parquet.lz4 WfastP Rpy→ │ 0 │ 0.0195 │ 0.005 │ 0.0245 │ 10.985 │ True │ 2.1533 │ 2.9742 │ 5.1276 │
│ pkl.zip C=2 │ 0 │ 0.0233 │ 0.0139 │ 0.0372 │ 6.8938 │ True │ 3.2713 │ 1.8665 │ 5.1378 │
│ pkl.gzip C=4 │ 0 │ 0.0276 │ 0.0111 │ 0.0387 │ 6.5232 │ True │ 3.4043 │ 1.7662 │ 5.1705 │
│ parquet.lz4 WfastP RfastP │ 0 │ 0.0184 │ 0.007 │ 0.0254 │ 10.985 │ True │ 2.2359 │ 2.9742 │ 5.2102 │
│ parquet.snappy WfastP Rpy→ │ 0 │ 0.0187 │ 0.0046 │ 0.0234 │ 11.844 │ True │ 2.0541 │ 3.2068 │ 5.2609 │
│ pkl.zip C=3 │ 0 │ 0.0262 │ 0.0129 │ 0.0391 │ 6.7945 │ True │ 3.4341 │ 1.8396 │ 5.2737 │
│ pkl.zip C=4 │ 0 │ 0.0287 │ 0.0131 │ 0.0418 │ 6.5244 │ True │ 3.6704 │ 1.7665 │ 5.4369 │
│ parquet.snappy WfastP RfastP │ 0 │ 0.0187 │ 0.0071 │ 0.0258 │ 11.844 │ True │ 2.2672 │ 3.2068 │ 5.4741 │
│ parquet.zstd Wpy→ Rpy→ │ 0 │ 0.0217 │ 0.0051 │ 0.0267 │ 11.7599 │ True │ 2.3502 │ 3.1841 │ 5.5343 │
│ parquet.lz4 Wpy→ Rpy→ │ 0 │ 0.0201 │ 0.0052 │ 0.0253 │ 12.3539 │ True │ 2.2238 │ 3.3449 │ 5.5687 │
│ parquet.snappy Wpy→ Rpy→ │ 0 │ 0.0212 │ 0.005 │ 0.0262 │ 12.3294 │ True │ 2.3061 │ 3.3383 │ 5.6444 │
│ pkl.zst C=10 │ 0 │ 0.0416 │ 0.0051 │ 0.0466 │ 5.8909 │ True │ 4.098 │ 1.595 │ 5.693 │
│ parquet.snappy Wpy→ RfastP │ 0 │ 0.0202 │ 0.0082 │ 0.0284 │ 12.3294 │ True │ 2.4929 │ 3.3383 │ 5.8311 │
│ parquet.zstd Wpy→ RfastP │ 0 │ 0.0204 │ 0.0107 │ 0.0311 │ 11.7599 │ True │ 2.7334 │ 3.1841 │ 5.9175 │
│ parquet.lz4 Wpy→ RfastP │ 0 │ 0.0213 │ 0.0083 │ 0.0295 │ 12.3539 │ True │ 2.596 │ 3.3449 │ 5.9409 │
│ pkl.zip C=5 │ 0 │ 0.0354 │ 0.0128 │ 0.0483 │ 6.3396 │ True │ 4.2414 │ 1.7165 │ 5.9578 │
│ feather │ 0 │ 0.0175 │ 0.005 │ 0.0225 │ 15.8187 │ True │ 1.9814 │ 4.283 │ 6.2644 │
│ pkl.zst C=13 │ 0 │ 0.0645 │ 0.0051 │ 0.0696 │ 5.8931 │ True │ 6.1129 │ 1.5956 │ 7.7085 │
│ pkl.zst C=11 │ 0 │ 0.0651 │ 0.0051 │ 0.0702 │ 5.8769 │ True │ 6.1654 │ 1.5912 │ 7.7566 │
│ pkl.gzip C=-1 │ 0 │ 0.0582 │ 0.0112 │ 0.0694 │ 6.2158 │ True │ 6.0946 │ 1.683 │ 7.7776 │
│ pkl.gzip C=6 │ 0 │ 0.0595 │ 0.0112 │ 0.0708 │ 6.2158 │ True │ 6.2184 │ 1.683 │ 7.9014 │
│ pkl.zst C=12 │ 0 │ 0.0689 │ 0.005 │ 0.074 │ 5.8768 │ True │ 6.5016 │ 1.5912 │ 8.0928 │
│ pkl.zip C=0 │ 0 │ 0.0592 │ 0.0138 │ 0.0729 │ 6.217 │ True │ 6.4098 │ 1.6833 │ 8.093 │
│ pkl.zip C=6 │ 0 │ 0.0604 │ 0.0128 │ 0.0732 │ 6.217 │ True │ 6.4295 │ 1.6833 │ 8.1128 │
│ parquet Wpy→ Rpy→ │ 0 │ 0.0237 │ 0.02 │ 0.0437 │ 15.8355 │ True │ 3.838 │ 4.2875 │ 8.1256 │
│ parquet.gzip WfastP Rpy→ │ 0 │ 0.0715 │ 0.0054 │ 0.077 │ 6.2091 │ True │ 6.7648 │ 1.6811 │ 8.4459 │
│ pkl.zip │ 0 │ 0.0643 │ 0.0133 │ 0.0776 │ 6.2169 │ True │ 6.8171 │ 1.6833 │ 8.5004 │
│ parquet.gzip Wpy→ Rpy→ │ 0 │ 0.0672 │ 0.0069 │ 0.0741 │ 8.9559 │ True │ 6.5152 │ 2.4248 │ 8.9401 │
│ parquet.brotli Wpy→ Rpy→ │ 0 │ 0.0721 │ 0.0069 │ 0.0789 │ 8.4436 │ True │ 6.9357 │ 2.2862 │ 9.2219 │
│ parquet.gzip WfastP RfastP │ 0 │ 0.0718 │ 0.0144 │ 0.0863 │ 6.2091 │ True │ 7.5805 │ 1.6811 │ 9.2617 │
│ pkl.gzip C=5 │ 0 │ 0.0753 │ 0.0121 │ 0.0874 │ 6.3384 │ True │ 7.6793 │ 1.7162 │ 9.3955 │
│ parquet.gzip Wpy→ RfastP │ 0 │ 0.0674 │ 0.012 │ 0.0794 │ 8.9559 │ True │ 6.981 │ 2.4248 │ 9.4058 │
│ pkl.zst C=14 │ 0 │ 0.0872 │ 0.005 │ 0.0922 │ 5.8724 │ True │ 8.1019 │ 1.59 │ 9.6919 │
│ parquet.brotli Wpy→ RfastP │ 0 │ 0.071 │ 0.0145 │ 0.0854 │ 8.4436 │ True │ 7.5092 │ 2.2862 │ 9.7954 │
│ pkl.gzip C=7 │ 0 │ 0.0839 │ 0.011 │ 0.0949 │ 6.248 │ True │ 8.3382 │ 1.6917 │ 10.0299 │
│ pkl.xz C=1 │ 0 │ 0.0683 │ 0.0311 │ 0.0994 │ 5.0277 │ True │ 8.737 │ 1.3613 │ 10.0983 │
│ parquet Wpy→ RfastP │ 0 │ 0.0188 │ 0.0484 │ 0.0672 │ 15.8355 │ True │ 5.9056 │ 4.2875 │ 10.1932 │
│ pkl.zip C=7 │ 0 │ 0.0846 │ 0.0128 │ 0.0974 │ 6.2491 │ True │ 8.5628 │ 1.692 │ 10.2548 │
│ pkl.zst C=15 │ 0 │ 0.1004 │ 0.0049 │ 0.1053 │ 5.8644 │ True │ 9.2575 │ 1.5878 │ 10.8453 │
│ pkl.xz C=2 │ 0 │ 0.0857 │ 0.0306 │ 0.1163 │ 5.0048 │ True │ 10.2244 │ 1.3551 │ 11.5795 │
│ pkl.xz C=3 │ 0 │ 0.0879 │ 0.0295 │ 0.1174 │ 4.9951 │ True │ 10.3181 │ 1.3525 │ 11.6705 │
│ pkl.gzip C=0 │ 0 │ 0.0073 │ 0.0041 │ 0.0114 │ 50.4742 │ True │ 1 │ 13.6662 │ 14.6662 │
│ pkl.gzip │ 0 │ 0.0084 │ 0.0043 │ 0.0127 │ 50.4742 │ True │ 1.1144 │ 13.6662 │ 14.7805 │
│ pkl │ 0 │ 0.0151 │ 0.0045 │ 0.0196 │ 50.4742 │ True │ 1.7214 │ 13.6662 │ 15.3876 │
│ pkl.zst C=16 │ 0 │ 0.1611 │ 0.005 │ 0.166 │ 5.7376 │ True │ 14.5922 │ 1.5535 │ 16.1456 │
│ orc │ 0 │ 0.0373 │ 0.0104 │ 0.0477 │ 49.0572 │ True │ 4.188 │ 13.2825 │ 17.4705 │
│ pkl.zst C=17 │ 0 │ 0.1871 │ 0.0051 │ 0.1922 │ 5.5962 │ True │ 16.8877 │ 1.5152 │ 18.4029 │
│ parquet WfastP RfastP │ 0 │ 0.0187 │ 0.0081 │ 0.0269 │ 60.0927 │ True │ 2.3609 │ 16.2704 │ 18.6314 │
│ pkl.gzip C=8 │ 0 │ 0.193 │ 0.011 │ 0.204 │ 6.2712 │ True │ 17.9306 │ 1.698 │ 19.6286 │
│ pkl.zip C=8 │ 0 │ 0.1941 │ 0.0131 │ 0.2071 │ 6.2724 │ True │ 18.2026 │ 1.6983 │ 19.9008 │
│ pkl.xz C=4 │ 0 │ 0.191 │ 0.0312 │ 0.2222 │ 5.0341 │ True │ 19.5231 │ 1.363 │ 20.8861 │
│ parquet WfastP Rpy→ │ 0 │ 0.0501 │ 0.0059 │ 0.0559 │ 60.0927 │ True │ 4.9168 │ 16.2704 │ 21.1872 │
│ pkl.zst C=18 │ 0 │ 0.2287 │ 0.005 │ 0.2338 │ 5.5585 │ True │ 20.5446 │ 1.505 │ 22.0496 │
│ pkl.xz C=5 │ 0 │ 0.2707 │ 0.0299 │ 0.3007 │ 4.9493 │ True │ 26.4217 │ 1.34 │ 27.7618 │
│ h5 │ 709932 │ 0.0947 │ 0.0131 │ 0.1078 │ 74.4187 │ True │ 9.4763 │ 20.1493 │ 29.6256 │
│ csv.zst C=2 │ 0 │ 0.2685 │ 0.0604 │ 0.3289 │ 6.279 │ True │ 28.9056 │ 1.7001 │ 30.6057 │
│ csv.zst C=3 │ 0 │ 0.2677 │ 0.0602 │ 0.3278 │ 6.89 │ True │ 28.8111 │ 1.8655 │ 30.6766 │
│ csv.zst C=-1 │ 0 │ 0.2659 │ 0.0596 │ 0.3254 │ 7.7361 │ True │ 28.6004 │ 2.0946 │ 30.695 │
│ csv.zst │ 0 │ 0.2678 │ 0.0604 │ 0.3282 │ 6.89 │ True │ 28.8429 │ 1.8655 │ 30.7084 │
│ csv.zst C=-3 │ 0 │ 0.2642 │ 0.0586 │ 0.3228 │ 9.1393 │ True │ 28.3699 │ 2.4745 │ 30.8445 │
│ csv.zst C=0 │ 0 │ 0.2681 │ 0.0621 │ 0.3301 │ 6.89 │ True │ 29.0117 │ 1.8655 │ 30.8772 │
│ csv.zst C=-2 │ 0 │ 0.2638 │ 0.0625 │ 0.3263 │ 8.1944 │ True │ 28.6789 │ 2.2187 │ 30.8976 │
│ csv.zst C=4 │ 0 │ 0.2677 │ 0.0647 │ 0.3323 │ 6.8912 │ True │ 29.2042 │ 1.8658 │ 31.07 │
│ csv.zst C=-4 │ 0 │ 0.2663 │ 0.0601 │ 0.3264 │ 10.6631 │ True │ 28.684 │ 2.8871 │ 31.5711 │
│ csv.zst C=-5 │ 0 │ 0.2623 │ 0.0594 │ 0.3217 │ 12.2078 │ True │ 28.2695 │ 3.3053 │ 31.5748 │
│ csv.zst C=5 │ 0 │ 0.28 │ 0.0607 │ 0.3407 │ 6.5467 │ True │ 29.9403 │ 1.7726 │ 31.7129 │
│ csv.zst C=-6 │ 0 │ 0.265 │ 0.0592 │ 0.3242 │ 13.9798 │ True │ 28.4902 │ 3.7851 │ 32.2753 │
│ csv.zst C=6 │ 0 │ 0.2883 │ 0.0605 │ 0.3488 │ 6.0172 │ True │ 30.649 │ 1.6292 │ 32.2782 │
│ csv.zst C=-7 │ 0 │ 0.2653 │ 0.058 │ 0.3233 │ 15.4334 │ True │ 28.4136 │ 4.1787 │ 32.5922 │
│ csv.gzip C=1 │ 0 │ 0.2786 │ 0.0675 │ 0.3461 │ 8.2605 │ True │ 30.4188 │ 2.2366 │ 32.6554 │
│ csv.zst C=7 │ 0 │ 0.295 │ 0.0593 │ 0.3542 │ 5.9885 │ True │ 31.1294 │ 1.6214 │ 32.7508 │
│ csv.zip C=2 │ 0 │ 0.2811 │ 0.0707 │ 0.3517 │ 8.1852 │ True │ 30.9105 │ 2.2162 │ 33.1267 │
│ csv.gzip C=3 │ 0 │ 0.2859 │ 0.0672 │ 0.3531 │ 8.1482 │ True │ 31.0302 │ 2.2062 │ 33.2364 │
│ csv.zst C=9 │ 0 │ 0.3018 │ 0.0607 │ 0.3625 │ 5.2551 │ True │ 31.8577 │ 1.4228 │ 33.2805 │
│ csv.zst C=1 │ 0 │ 0.2641 │ 0.0969 │ 0.361 │ 6.2331 │ True │ 31.7278 │ 1.6876 │ 33.4155 │
│ csv.zip C=1 │ 0 │ 0.2842 │ 0.0713 │ 0.3556 │ 8.2616 │ True │ 31.2463 │ 2.2369 │ 33.4832 │
│ csv.zip C=3 │ 0 │ 0.2873 │ 0.0701 │ 0.3574 │ 8.1492 │ True │ 31.4077 │ 2.2064 │ 33.6141 │
│ csv.zst C=8 │ 0 │ 0.3064 │ 0.0601 │ 0.3666 │ 5.2407 │ True │ 32.2156 │ 1.4189 │ 33.6346 │
│ csv.gzip C=2 │ 0 │ 0.2907 │ 0.0674 │ 0.3581 │ 8.1841 │ True │ 31.473 │ 2.2159 │ 33.6889 │
│ csv.zip C=5 │ 0 │ 0.2986 │ 0.0696 │ 0.3682 │ 6.59 │ True │ 32.3596 │ 1.7843 │ 34.1438 │
│ csv.zip C=4 │ 0 │ 0.298 │ 0.0703 │ 0.3683 │ 6.9332 │ True │ 32.3696 │ 1.8772 │ 34.2468 │
│ csv.gzip C=4 │ 0 │ 0.2901 │ 0.0797 │ 0.3698 │ 6.9321 │ True │ 32.5009 │ 1.8769 │ 34.3778 │
│ csv.zip C=6 │ 0 │ 0.3081 │ 0.0687 │ 0.3769 │ 6.5602 │ True │ 33.1181 │ 1.7762 │ 34.8943 │
│ csv.gzip C=-1 │ 0 │ 0.3094 │ 0.0676 │ 0.377 │ 6.5591 │ True │ 33.1277 │ 1.7759 │ 34.9036 │
│ csv.zip │ 0 │ 0.3087 │ 0.0689 │ 0.3776 │ 6.5601 │ True │ 33.1819 │ 1.7762 │ 34.9581 │
│ csv.gzip C=5 │ 0 │ 0.3101 │ 0.0681 │ 0.3782 │ 6.5889 │ True │ 33.2352 │ 1.784 │ 35.0192 │
│ csv.gzip C=6 │ 0 │ 0.3123 │ 0.0672 │ 0.3795 │ 6.5591 │ True │ 33.352 │ 1.7759 │ 35.1279 │
│ csv.gzip C=7 │ 0 │ 0.3214 │ 0.0661 │ 0.3876 │ 6.176 │ True │ 34.0578 │ 1.6722 │ 35.73 │
│ csv.zip C=0 │ 0 │ 0.3178 │ 0.0697 │ 0.3875 │ 6.5602 │ True │ 34.0559 │ 1.7762 │ 35.8321 │
│ csv.zst C=10 │ 0 │ 0.3334 │ 0.0593 │ 0.3927 │ 5.2271 │ True │ 34.5106 │ 1.4153 │ 35.9259 │
│ csv.zip C=7 │ 0 │ 0.3242 │ 0.0693 │ 0.3935 │ 6.177 │ True │ 34.5817 │ 1.6725 │ 36.2541 │
│ pkl.gzip C=9 │ 0 │ 0.3842 │ 0.011 │ 0.3952 │ 6.2702 │ True │ 34.7297 │ 1.6977 │ 36.4274 │
│ pkl.zip C=9 │ 0 │ 0.386 │ 0.0135 │ 0.3995 │ 6.2714 │ True │ 35.1054 │ 1.698 │ 36.8035 │
│ csv.zst C=11 │ 0 │ 0.3659 │ 0.0609 │ 0.4268 │ 5.2142 │ True │ 37.5065 │ 1.4118 │ 38.9182 │
│ pkl.xz │ 0 │ 0.3968 │ 0.031 │ 0.4278 │ 4.892 │ True │ 37.594 │ 1.3245 │ 38.9185 │
│ csv.zst C=13 │ 0 │ 0.3701 │ 0.0588 │ 0.4289 │ 5.2012 │ True │ 37.6922 │ 1.4083 │ 39.1005 │
│ csv.xz C=1 │ 0 │ 0.349 │ 0.0842 │ 0.4332 │ 5.2652 │ True │ 38.0704 │ 1.4256 │ 39.4959 │
│ pkl.xz C=6 │ 0 │ 0.4044 │ 0.0311 │ 0.4356 │ 4.892 │ True │ 38.2783 │ 1.3245 │ 39.6028 │
│ pkl.xz C=7 │ 0 │ 0.405 │ 0.0309 │ 0.436 │ 4.892 │ True │ 38.3115 │ 1.3245 │ 39.636 │
│ csv.zst C=12 │ 0 │ 0.3762 │ 0.0591 │ 0.4354 │ 5.2146 │ True │ 38.2607 │ 1.4119 │ 39.6725 │
│ pkl.xz C=0 │ 0 │ 0.4093 │ 0.0305 │ 0.4398 │ 4.892 │ True │ 38.6468 │ 1.3245 │ 39.9713 │
│ pkl.xz C=9 │ 0 │ 0.423 │ 0.0309 │ 0.4539 │ 4.892 │ True │ 39.8885 │ 1.3245 │ 41.213 │
│ pkl.zst C=19 │ 0 │ 0.4495 │ 0.0051 │ 0.4546 │ 5.5052 │ True │ 39.9457 │ 1.4906 │ 41.4362 │
│ pkl.xz C=8 │ 0 │ 0.4282 │ 0.0303 │ 0.4585 │ 4.892 │ True │ 40.2898 │ 1.3245 │ 41.6144 │
│ pkl.bz2 C=1 │ 0 │ 0.4151 │ 0.0476 │ 0.4627 │ 7.305 │ True │ 40.6649 │ 1.9779 │ 42.6428 │
│ pkl.bz2 C=2 │ 0 │ 0.4157 │ 0.0489 │ 0.4646 │ 7.3712 │ True │ 40.826 │ 1.9958 │ 42.8218 │
│ pkl.zst C=20 │ 0 │ 0.4659 │ 0.005 │ 0.4709 │ 5.5052 │ True │ 41.3832 │ 1.4906 │ 42.8738 │
│ pkl.bz2 C=3 │ 0 │ 0.4242 │ 0.0503 │ 0.4745 │ 7.4137 │ True │ 41.6971 │ 2.0073 │ 43.7044 │
│ csv.gzip C=8 │ 0 │ 0.413 │ 0.0666 │ 0.4796 │ 5.9554 │ True │ 42.1441 │ 1.6125 │ 43.7566 │
│ csv.zst C=14 │ 0 │ 0.4246 │ 0.0586 │ 0.4832 │ 5.1934 │ True │ 42.4615 │ 1.4061 │ 43.8676 │
│ csv.zip C=8 │ 0 │ 0.4171 │ 0.0683 │ 0.4854 │ 5.9565 │ True │ 42.6548 │ 1.6127 │ 44.2676 │
│ pkl.bz2 C=5 │ 0 │ 0.4329 │ 0.0489 │ 0.4818 │ 7.5201 │ True │ 42.3433 │ 2.0361 │ 44.3794 │
│ pkl.bz2 C=4 │ 0 │ 0.4394 │ 0.0512 │ 0.4906 │ 7.433 │ True │ 43.1119 │ 2.0125 │ 45.1244 │
│ csv.xz C=2 │ 0 │ 0.4142 │ 0.0833 │ 0.4975 │ 5.2378 │ True │ 43.7245 │ 1.4182 │ 45.1426 │
│ csv.gzip C=9 │ 0 │ 0.4475 │ 0.0663 │ 0.5138 │ 5.9553 │ True │ 45.1539 │ 1.6124 │ 46.7663 │
│ pkl.bz2 C=6 │ 0 │ 0.4584 │ 0.0511 │ 0.5096 │ 7.5481 │ True │ 44.7814 │ 2.0437 │ 46.8251 │
│ csv.zip C=9 │ 0 │ 0.4518 │ 0.0687 │ 0.5205 │ 5.9564 │ True │ 45.7397 │ 1.6127 │ 47.3524 │
│ csv.zst C=15 │ 0 │ 0.4842 │ 0.0595 │ 0.5437 │ 5.1913 │ True │ 47.7833 │ 1.4056 │ 49.1889 │
│ csv.gzip │ 0 │ 0.2959 │ 0.059 │ 0.3548 │ 69.7714 │ True │ 31.1839 │ 18.891 │ 50.0749 │
│ csv │ 0 │ 0.2969 │ 0.0585 │ 0.3553 │ 69.7714 │ True │ 31.2247 │ 18.891 │ 50.1156 │
│ pkl.zst C=21 │ 0 │ 0.5581 │ 0.0052 │ 0.5633 │ 5.5021 │ True │ 49.5019 │ 1.4897 │ 50.9916 │
│ pkl.bz2 C=8 │ 0 │ 0.5098 │ 0.053 │ 0.5628 │ 7.5395 │ True │ 49.4557 │ 2.0414 │ 51.497 │
│ pkl.bz2 C=7 │ 0 │ 0.5289 │ 0.0517 │ 0.5807 │ 7.5493 │ True │ 51.0294 │ 2.044 │ 53.0734 │
│ csv.xz C=4 │ 0 │ 0.5012 │ 0.0866 │ 0.5878 │ 5.7738 │ True │ 51.6527 │ 1.5633 │ 53.216 │
│ csv.gzip C=0 │ 0 │ 0.3323 │ 0.0585 │ 0.3908 │ 69.7714 │ True │ 34.3406 │ 18.891 │ 53.2316 │
│ pkl.bz2 C=9 │ 0 │ 0.5306 │ 0.0524 │ 0.583 │ 7.5608 │ True │ 51.2302 │ 2.0471 │ 53.2773 │
│ pkl.bz2 │ 0 │ 0.5341 │ 0.0516 │ 0.5858 │ 7.5608 │ True │ 51.4789 │ 2.0471 │ 53.526 │
│ csv.xz C=3 │ 0 │ 0.5131 │ 0.0826 │ 0.5956 │ 5.237 │ True │ 52.3436 │ 1.4179 │ 53.7615 │
│ csv.bz2 C=2 │ 0 │ 0.5173 │ 0.107 │ 0.6243 │ 4.9737 │ True │ 54.8619 │ 1.3467 │ 56.2086 │
│ csv.bz2 C=3 │ 0 │ 0.5202 │ 0.1086 │ 0.6288 │ 4.9857 │ True │ 55.2586 │ 1.3499 │ 56.6085 │
│ csv.bz2 C=1 │ 0 │ 0.5295 │ 0.1034 │ 0.633 │ 4.9468 │ True │ 55.6252 │ 1.3394 │ 56.9645 │
│ csv.bz2 C=4 │ 0 │ 0.5273 │ 0.1086 │ 0.6359 │ 4.99 │ True │ 55.8844 │ 1.3511 │ 57.2355 │
│ csv.bz2 C=7 │ 0 │ 0.5244 │ 0.1114 │ 0.6358 │ 5.0479 │ True │ 55.8746 │ 1.3667 │ 57.2414 │
│ csv.bz2 C=9 │ 0 │ 0.5289 │ 0.1104 │ 0.6393 │ 5.0619 │ True │ 56.1792 │ 1.3705 │ 57.5497 │
│ csv.bz2 C=6 │ 0 │ 0.5356 │ 0.108 │ 0.6436 │ 5.0176 │ True │ 56.5571 │ 1.3585 │ 57.9156 │
│ csv.bz2 C=5 │ 0 │ 0.5346 │ 0.1114 │ 0.646 │ 5.0139 │ True │ 56.7694 │ 1.3575 │ 58.127 │
│ csv.bz2 C=8 │ 0 │ 0.5388 │ 0.1122 │ 0.651 │ 5.0523 │ True │ 57.2078 │ 1.3679 │ 58.5757 │
│ csv.bz2 │ 0 │ 0.5404 │ 0.1128 │ 0.6532 │ 5.0619 │ True │ 57.4055 │ 1.3705 │ 58.7761 │
│ pkl.zst C=22 │ 0 │ 0.6904 │ 0.0054 │ 0.6958 │ 5.5004 │ True │ 61.1427 │ 1.4893 │ 62.632 │
│ csv.zst C=16 │ 0 │ 0.6467 │ 0.0937 │ 0.7404 │ 5.1839 │ True │ 65.0641 │ 1.4036 │ 66.4677 │
│ csv.xz C=5 │ 0 │ 0.7549 │ 0.0841 │ 0.839 │ 4.5769 │ True │ 73.7312 │ 1.2392 │ 74.9704 │
│ csv.zst C=17 │ 0 │ 0.83 │ 0.0605 │ 0.8905 │ 4.2472 │ True │ 78.2559 │ 1.1499 │ 79.4058 │
│ csv.zst C=18 │ 0 │ 1.3661 │ 0.0581 │ 1.4242 │ 4.2314 │ True │ 125.16 │ 1.1457 │ 126.3056 │
│ csv.zst C=19 │ 0 │ 1.625 │ 0.059 │ 1.6841 │ 4.2243 │ True │ 147.9961 │ 1.1438 │ 149.1398 │
│ csv.zst C=20 │ 0 │ 1.6369 │ 0.0605 │ 1.6974 │ 4.2243 │ True │ 149.17 │ 1.1438 │ 150.3138 │
│ csv.zst C=21 │ 0 │ 1.682 │ 0.0616 │ 1.7436 │ 4.2247 │ True │ 153.2249 │ 1.1438 │ 154.3687 │
│ csv.xz │ 0 │ 1.873 │ 0.0844 │ 1.9574 │ 3.6934 │ True │ 172.0111 │ 1 │ 173.0111 │
│ csv.xz C=6 │ 0 │ 1.8789 │ 0.0841 │ 1.963 │ 3.6934 │ True │ 172.5087 │ 1 │ 173.5087 │
│ csv.xz C=7 │ 0 │ 1.8913 │ 0.0841 │ 1.9754 │ 3.6934 │ True │ 173.6 │ 1 │ 174.6 │
│ csv.xz C=8 │ 0 │ 1.8992 │ 0.0848 │ 1.9841 │ 3.6934 │ True │ 174.3589 │ 1 │ 175.3589 │
│ csv.xz C=9 │ 0 │ 1.9168 │ 0.0846 │ 2.0015 │ 3.6934 │ True │ 175.8877 │ 1 │ 176.8877 │
│ csv.xz C=0 │ 0 │ 1.9288 │ 0.0844 │ 2.0133 │ 3.6934 │ True │ 176.9252 │ 1 │ 177.9252 │
│ csv.zst C=22 │ 0 │ 2.5222 │ 0.059 │ 2.5813 │ 4.225 │ True │ 226.8393 │ 1.1439 │ 227.9833 │
│ parquet.brotli WfastP Rpy→ │ 0 │ 3.4595 │ 0.0065 │ 3.466 │ 4.833 │ True │ 304.5937 │ 1.3085 │ 305.9022 │
│ parquet.brotli WfastP RfastP │ 0 │ 3.4619 │ 0.0142 │ 3.4761 │ 4.833 │ True │ 305.4772 │ 1.3085 │ 306.7857 │
└──────────────────────────────┴────────────┴────────┴────────┴────────┴─────────┴────────────┴────────────┴────────────┴──────────┘
```
### Table Output Explanation
- Format = information about how this row was created
- `C=` = The compression level used
- `Wpy→`, `WfastP` = the write engine used for `.parquet` files (`pyarrow` or `fastparquet`)
- `Rpy→`, `RfastP` = the read engine used for `.parquet` files (`pyarrow` or `fastparquet`)
- `Rc`, `Rpy`, `Rpy→` = the read parser engine used for `.csv` files (`C`, `python`, or `pyarrow`)
- If nothing is specified, then the default settings were used.
- Total I/O Normalized = How many times slower this format was vs the fastest format at writing to file + reading from file
- Output File Size Normalized = How many times larger this format's file was vs the smallest format's file
- Score (lower is better) = I'm defining "score" as `Total I/O Normalized` + `Output File Size Normalized`.
- I am attempting to find the format that has the lowest file size **and** the fastest read and write times.
- You should adjust the score based on your personal use case
- Ex. if all you care about is minimizing file size, use `csv.xz`. If you desire fastest read + write times, use `.pkl.gzip`.
#### Verbose flags:
- DataFrame Memory Difference (B) = In bytes, the `DataFrame` size when read from file - `DataFrame` size before it was written to file
- Should be 0, investigation recommended if it is not
- Equivalent DataFrames = Sanity check to ensure the `DataFrame`s are equal via `DataFrame.eqals()`
- Should be `True`. Investigation needed if it is not