{"id":34775568,"url":"https://github.com/runsascoded/parquet-diff-test","last_synced_at":"2026-05-24T17:35:16.832Z","repository":{"id":214962510,"uuid":"737381585","full_name":"runsascoded/parquet-diff-test","owner":"runsascoded","description":"Demonstrate differences in Parquet files generated by pyarrow on macOS vs. {Ubuntu, Windows}.","archived":false,"fork":false,"pushed_at":"2023-12-31T17:10:16.000Z","size":43,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-04-18T11:29:24.699Z","etag":null,"topics":["arrow","parquet","pyarrow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/runsascoded.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-12-30T20:34:17.000Z","updated_at":"2024-01-01T12:53:49.000Z","dependencies_parsed_at":"2024-01-01T15:07:10.722Z","dependency_job_id":null,"html_url":"https://github.com/runsascoded/parquet-diff-test","commit_stats":null,"previous_names":["runsascoded/parquet-diff-test"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/runsascoded/parquet-diff-test","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runsascoded%2Fparquet-diff-test","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runsascoded%2Fparquet-diff-test/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runsascoded%2Fparquet-diff-test/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runsascoded%2Fparquet-diff-test/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/runsascoded","download_url":"https://codeload.github.com/runsascoded/parquet-diff-test/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/runsascoded%2Fparquet-diff-test/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28024469,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-25T02:00:05.988Z","response_time":58,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","parquet","pyarrow"],"created_at":"2025-12-25T08:16:11.555Z","updated_at":"2025-12-25T08:16:12.070Z","avatar_url":"https://github.com/runsascoded.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# parquet-diff-test\nDemonstrate differences in Parquet files generated by [pyarrow] on macOS vs. {Ubuntu, Windows} (see [arrow#39399](https://github.com/apache/arrow/issues/39399)).\n\n## CLI\n\nFor each {engine, compression codec}:\n- **Engine:** [pyarrow], [fastparquet]\n- **Compression:** snappy, gzip, brotli, lz4, zstd\n\n[`parquet-diff-test`] writes a simple Parquet file:\n\n```python\ndf = pd.DataFrame([{ 'a': 111 }])\nempty_df = df.iloc[:0]  # subset the dataset to have 0 rows\nout_dir = f'out/{engine}/{compression}'\nparquet_path = f'{out_dir}/empty.parquet'\nempty_df.to_parquet(parquet_path, engine=engine, compression=compression)\n```\n\nIn the same directory, it also writes:\n- `metadata.json`, which includes:\n  - the `pyarrow.ParquetFile.metadata` dictionary\n  - file size\n  - file sha256 hash\n- `xxd.txt`: ASCII representation of every byte in `empty.parquet`\n\n## Results\n\nThe [test.yml](.github/workflows/test.yml) workflow runs `parquet-diff-test` on Ubuntu, macOS, and Windows, and pushes the results of each to a branch.\n\nHere are the [`macos`] and [`windows`] branches' compared to [`ubuntu`]:\n- [`ubuntu..macos`]\n- [`ubuntu..windows`]\n\n### Summary\n- ✅ In all cases, Parquet files generated by [`fastparquet`] are identical .across OSes\n- 🤔 In many cases, those generated by `pyarrow` are different from each other.\n\n#### pyarrow\n\n|        | Ubuntu | Windows | macOS |\n|-------:|-------:|--------:|------:|\n| brotli |      ✅ |     ✅ |    ❌ |\n|   gzip |     ⚠️ |     ⚠️ |    ❌ |\n|    lz4 |      ✅ |     ✅ |    ❌ |\n| snappy |      ✅ |     ✅ |    ❌ |\n|   zstd |      ✅ |     ✅ |    ❌ |\n\n#### fastparquet\n\n|        | Ubuntu | Windows | macOS |\n|-------:|-------:|--------:|------:|\n| brotli |      ✅ |       ✅ |     ✅ |\n|   gzip |       ✅ |        ✅ |     ✅ |\n|    lz4 |      ✅ |       ✅ |     ✅ |\n| snappy |      ✅ |       ✅ |     ✅ |\n|   zstd |      ✅ |       ✅ |     ✅ |\n\n### Full diffs\n\n#### [`ubuntu..macos`]\n- All [`fastparquet`] parquets are identical.\n- All [`pyarrow`] parquets differ.\n\nFor example, [here's the diff][ubuntu..macos xxd] for {`pyarrow`, `snappy`}:\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003egit diff ubuntu..macos -- out/pyarrow/snappy/xxd.txt\u003c/code\u003e\n\u003c/summary\u003e\n\n```diff\n 00000280: 7741 4141 4145 4141 6741 4367 4141 414e  wAAAAEAAgACgAAAN\n 00000290: 7742 4141 4145 4141 4141 4151 4141 4141  wBAAAEAAAAAQAAAA\n 000002a0: 7741 4141 4149 4141 7741 4241 4149 4141  wAAAAIAAwABAAIAA\n-000002b0: 6741 4141 4149 4141 4141 4541 4141 4141  gAAAAIAAAAEAAAAA\n-000002c0: 5941 4141 4277 5957 356b 5958 4d41 414b  YAAABwYW5kYXMAAK\n-000002d0: 5942 4141 4237 496d 6c75 5a47 5634 5832  YBAAB7ImluZGV4X2\n-000002e0: 4e76 6248 5674 626e 4d69 4f69 4262 6579  NvbHVtbnMiOiBbey\n-000002f0: 4a72 6157 356b 496a 6f67 496e 4a68 626d  JraW5kIjogInJhbm\n-00000300: 646c 4969 7767 496d 3568 6257 5569 4f69  dlIiwgIm5hbWUiOi\n-00000310: 4275 6457 7873 4c43 4169 6333 5268 636e  BudWxsLCAic3Rhcn\n-00000320: 5169 4f69 4177 4c43 4169 6333 5276 6343  QiOiAwLCAic3RvcC\n-00000330: 4936 4944 4173 4943 4a7a 6447 5677 496a  I6IDAsICJzdGVwIj\n-00000340: 6f67 4d58 3164 4c43 4169 5932 3973 6457  ogMX1dLCAiY29sdW\n-00000350: 3175 5832 6c75 5a47 5634 5a58 4d69 4f69  1uX2luZGV4ZXMiOi\n-00000360: 4262 6579 4a75 5957 316c 496a 6f67 626e  BbeyJuYW1lIjogbn\n-00000370: 5673 6243 7767 496d 5a70 5a57 786b 5832  VsbCwgImZpZWxkX2\n-00000380: 3568 6257 5569 4f69 4275 6457 7873 4c43  5hbWUiOiBudWxsLC\n-00000390: 4169 6347 4675 5a47 467a 5833 5235 6347  AicGFuZGFzX3R5cG\n-000003a0: 5569 4f69 4169 6457 3570 5932 396b 5a53  UiOiAidW5pY29kZS\n-000003b0: 4973 4943 4a75 6457 3177 6556 3930 6558  IsICJudW1weV90eX\n-000003c0: 426c 496a 6f67 496d 3969 616d 566a 6443  BlIjogIm9iamVjdC\n-000003d0: 4973 4943 4a74 5a58 5268 5a47 4630 5953  IsICJtZXRhZGF0YS\n-000003e0: 4936 4948 7369 5a57 356a 6232 5270 626d  I6IHsiZW5jb2Rpbm\n-000003f0: 6369 4f69 4169 5656 5247 4c54 6769 6658  ciOiAiVVRGLTgifX\n-00000400: 3164 4c43 4169 5932 3973 6457 3175 6379  1dLCAiY29sdW1ucy\n-00000410: 4936 4946 7437 496d 3568 6257 5569 4f69  I6IFt7Im5hbWUiOi\n-00000420: 4169 5953 4973 4943 4a6d 6157 5673 5a46  AiYSIsICJmaWVsZF\n-00000430: 3975 5957 316c 496a 6f67 496d 4569 4c43  9uYW1lIjogImEiLC\n-00000440: 4169 6347 4675 5a47 467a 5833 5235 6347  AicGFuZGFzX3R5cG\n-00000450: 5569 4f69 4169 6157 3530 4e6a 5169 4c43  UiOiAiaW50NjQiLC\n-00000460: 4169 626e 5674 6348 6c66 6448 6c77 5a53  AibnVtcHlfdHlwZS\n-00000470: 4936 4943 4a70 626e 5132 4e43 4973 4943  I6ICJpbnQ2NCIsIC\n-00000480: 4a74 5a58 5268 5a47 4630 5953 4936 4947  JtZXRhZGF0YSI6IG\n-00000490: 3531 6247 7839 5853 7767 496d 4e79 5a57  51bGx9XSwgImNyZW\n-000004a0: 4630 6233 4969 4f69 4237 496d 7870 596e  F0b3IiOiB7ImxpYn\n-000004b0: 4a68 636e 6b69 4f69 4169 6348 6c68 636e  JhcnkiOiAicHlhcn\n-000004c0: 4a76 6479 4973 4943 4a32 5a58 4a7a 6157  JvdyIsICJ2ZXJzaW\n-000004d0: 3975 496a 6f67 496a 4530 4c6a 4175 4d69  9uIjogIjE0LjAuMi\n-000004e0: 4a39 4c43 4169 6347 4675 5a47 467a 5833  J9LCAicGFuZGFzX3\n-000004f0: 5a6c 636e 4e70 6232 3469 4f69 4169 4d69  ZlcnNpb24iOiAiMi\n-00000500: 3478 4c6a 5169 6651 4141 4151 4141 4142  4xLjQifQAAAQAAAB\n+000002b0: 6741 4141 4330 4151 4141 4241 4141 414b  gAAAC0AQAABAAAAK\n+000002c0: 5942 4141 4237 496d 6c75 5a47 5634 5832  YBAAB7ImluZGV4X2\n+000002d0: 4e76 6248 5674 626e 4d69 4f69 4262 6579  NvbHVtbnMiOiBbey\n+000002e0: 4a72 6157 356b 496a 6f67 496e 4a68 626d  JraW5kIjogInJhbm\n+000002f0: 646c 4969 7767 496d 3568 6257 5569 4f69  dlIiwgIm5hbWUiOi\n+00000300: 4275 6457 7873 4c43 4169 6333 5268 636e  BudWxsLCAic3Rhcn\n+00000310: 5169 4f69 4177 4c43 4169 6333 5276 6343  QiOiAwLCAic3RvcC\n+00000320: 4936 4944 4173 4943 4a7a 6447 5677 496a  I6IDAsICJzdGVwIj\n+00000330: 6f67 4d58 3164 4c43 4169 5932 3973 6457  ogMX1dLCAiY29sdW\n+00000340: 3175 5832 6c75 5a47 5634 5a58 4d69 4f69  1uX2luZGV4ZXMiOi\n+00000350: 4262 6579 4a75 5957 316c 496a 6f67 626e  BbeyJuYW1lIjogbn\n+00000360: 5673 6243 7767 496d 5a70 5a57 786b 5832  VsbCwgImZpZWxkX2\n+00000370: 3568 6257 5569 4f69 4275 6457 7873 4c43  5hbWUiOiBudWxsLC\n+00000380: 4169 6347 4675 5a47 467a 5833 5235 6347  AicGFuZGFzX3R5cG\n+00000390: 5569 4f69 4169 6457 3570 5932 396b 5a53  UiOiAidW5pY29kZS\n+000003a0: 4973 4943 4a75 6457 3177 6556 3930 6558  IsICJudW1weV90eX\n+000003b0: 426c 496a 6f67 496d 3969 616d 566a 6443  BlIjogIm9iamVjdC\n+000003c0: 4973 4943 4a74 5a58 5268 5a47 4630 5953  IsICJtZXRhZGF0YS\n+000003d0: 4936 4948 7369 5a57 356a 6232 5270 626d  I6IHsiZW5jb2Rpbm\n+000003e0: 6369 4f69 4169 5656 5247 4c54 6769 6658  ciOiAiVVRGLTgifX\n+000003f0: 3164 4c43 4169 5932 3973 6457 3175 6379  1dLCAiY29sdW1ucy\n+00000400: 4936 4946 7437 496d 3568 6257 5569 4f69  I6IFt7Im5hbWUiOi\n+00000410: 4169 5953 4973 4943 4a6d 6157 5673 5a46  AiYSIsICJmaWVsZF\n+00000420: 3975 5957 316c 496a 6f67 496d 4569 4c43  9uYW1lIjogImEiLC\n+00000430: 4169 6347 4675 5a47 467a 5833 5235 6347  AicGFuZGFzX3R5cG\n+00000440: 5569 4f69 4169 6157 3530 4e6a 5169 4c43  UiOiAiaW50NjQiLC\n+00000450: 4169 626e 5674 6348 6c66 6448 6c77 5a53  AibnVtcHlfdHlwZS\n+00000460: 4936 4943 4a70 626e 5132 4e43 4973 4943  I6ICJpbnQ2NCIsIC\n+00000470: 4a74 5a58 5268 5a47 4630 5953 4936 4947  JtZXRhZGF0YSI6IG\n+00000480: 3531 6247 7839 5853 7767 496d 4e79 5a57  51bGx9XSwgImNyZW\n+00000490: 4630 6233 4969 4f69 4237 496d 7870 596e  F0b3IiOiB7ImxpYn\n+000004a0: 4a68 636e 6b69 4f69 4169 6348 6c68 636e  JhcnkiOiAicHlhcn\n+000004b0: 4a76 6479 4973 4943 4a32 5a58 4a7a 6157  JvdyIsICJ2ZXJzaW\n+000004c0: 3975 496a 6f67 496a 4530 4c6a 4175 4d69  9uIjogIjE0LjAuMi\n+000004d0: 4a39 4c43 4169 6347 4675 5a47 467a 5833  J9LCAicGFuZGFzX3\n+000004e0: 5a6c 636e 4e70 6232 3469 4f69 4169 4d69  ZlcnNpb24iOiAiMi\n+000004f0: 3478 4c6a 5169 6651 4141 4267 4141 4148  4xLjQifQAABgAAAH\n+00000500: 4268 626d 5268 6377 4141 4151 4141 4142  BhbmRhcwAAAQAAAB\n 00000510: 5141 4141 4151 4142 5141 4341 4147 4141  QAAAAQABQACAAGAA\n 00000520: 6341 4441 4141 4142 4141 4541 4141 4141  cADAAAABAAEAAAAA\n 00000530: 4141 4151 4951 4141 4141 4841 4141 4141  AAAQIQAAAAHAAAAA\n```\n\u003c/details\u003e\n\nThe `pyarrow` metadata is the same for both; I can't tell what explains the difference.\n\n#### [`ubuntu..windows`]\n- All `fastparquet` parquets are identical.\n- `pyarrow` parquets are mostly identical, except for one header byte in the `gzip` codec.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003ccode\u003egit diff ubuntu..windows -- out/pyarrow/gzip/xxd.txt\u003c/code\u003e\u003c/summary\u003e\n\n```diff\n 00000000: 5041 5231 1504 1500 1528 4c15 0015 0012  PAR1.....(L.....\n-00000010: 0000 1f8b 0800 0000 0000 0003 0300 0000  ................\n+00000010: 0000 1f8b 0800 0000 0000 000a 0300 0000  ................\n 00000020: 0000 0000 0000 264c 1c15 0419 2500 0619  ......\u0026L....%...\n 00000030: 1801 6115 0416 0016 1c16 4426 0026 0829  ..a.......D\u0026.\u0026.)\n 00000040: 1c15 0415 0015 0200 0000 1504 192c 3500  .............,5.\n```\n\u003c/details\u003e\n\n## Discussion\nThe discrepancy between macOS and Ubuntu has made some tests inconvenient; it would be nice to understand why it occurs. \n\n### Docker\nInterestingly, I see the same macOS diffs when running [`run.sh`] in an `ubuntu` Docker image on a macOS host machine\n\n[`parquet-diff-test`]: parquet_diff_test/cli.py\n[`fastparquet`]: https://pypi.org/project/fastparquet/\n[fastparquet]: https://pypi.org/project/fastparquet/\n[pyarrow]: https://pypi.org/project/pyarrow/\n[`pyarrow`]: https://pypi.org/project/pyarrow/\n[`macos`]: https://github.com/runsascoded/parquet-diff-test/tree/macos \n[`windows`]: https://github.com/runsascoded/parquet-diff-test/tree/windows \n[`ubuntu`]: https://github.com/runsascoded/parquet-diff-test/tree/ubuntu\n[`ubuntu..macos`]: https://github.com/runsascoded/parquet-diff-test/compare/ubuntu..macos\n[`ubuntu..windows`]: https://github.com/runsascoded/parquet-diff-test/compare/ubuntu..windows\n[ubuntu..macos xxd]: https://github.com/runsascoded/parquet-diff-test/compare/ubuntu..macos#diff-1aff51203a0bbf705859a61d542f15bfa553b121b30fea500f03024a8ae44258\n[`run.sh`]: run.sh\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunsascoded%2Fparquet-diff-test","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frunsascoded%2Fparquet-diff-test","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frunsascoded%2Fparquet-diff-test/lists"}