{"id":25982532,"url":"https://github.com/aastopher/io_bench","last_synced_at":"2026-01-24T04:36:56.177Z","repository":{"id":251565657,"uuid":"837413102","full_name":"aastopher/io_bench","owner":"aastopher","description":"IO Bench is a library designed to benchmark the performance of standard flat file formats and partitioning schemes.","archived":false,"fork":false,"pushed_at":"2024-08-21T01:48:58.000Z","size":877,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-10T04:06:29.072Z","etag":null,"topics":["arrow","avro","benchmark","benchmarking","feather","parquet","performance","polars","utils"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aastopher.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-02T23:28:21.000Z","updated_at":"2025-02-01T19:02:20.000Z","dependencies_parsed_at":"2024-08-13T07:41:36.595Z","dependency_job_id":"308957c0-af5b-46b2-b7d8-4442802596b6","html_url":"https://github.com/aastopher/io_bench","commit_stats":null,"previous_names":["aastopher/io_bench"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aastopher%2Fio_bench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aastopher%2Fio_bench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aastopher%2Fio_bench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aastopher%2Fio_bench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aastopher","download_url":"https://codeload.github.com/aastopher/io_bench/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242000402,"owners_count":20055666,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","avro","benchmark","benchmarking","feather","parquet","performance","polars","utils"],"created_at":"2025-03-05T09:32:40.619Z","updated_at":"2026-01-24T04:36:51.158Z","avatar_url":"https://github.com/aastopher.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI version](https://badge.fury.io/py/io_bench.svg)](https://badge.fury.io/py/io_bench)\n[![Documentation Status](https://img.shields.io/badge/docs-online-brightgreen)](https://aastopher.github.io/io_bench/)\n[![codecov](https://codecov.io/gh/aastopher/io_bench/graph/badge.svg?token=79V7VRZWV0)](https://codecov.io/gh/aastopher/io_bench)\n[![DeepSource](https://app.deepsource.com/gh/aastopher/io_bench.svg/?label=active+issues\u0026show_trend=true\u0026token=3NT8mR1AQRLW9zDNKWQ8vgFl)](https://app.deepsource.com/gh/aastopher/io_bench/)\n\n# IOBench Quick Start Guide\n\n## Generating Sample Data\nTo generate sample data, initialize the `IOBench` object with the path to the source CSV file and call the `generate_sample` method:\n\n```python\nfrom io_bench import IOBench\n\nbench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars', 'parquet_arrow', 'parquet_fast', 'feather', 'feather_arrow'])\nbench.generate_sample(records=100000) # default value\n```\n**NOTE:** `source_file` behavior is contextual; providing a desired name for a sample file then calling `generate_sample` will create the file. Otherwise a valid path to an existing file must be provided.\n\n## Converting Data to Partitioned Formats\nConvert the generated CSV data to partitioned formats (Avro, Parquet, Feather) will automatically partition on default column selection chunks if not defined.\n\n```python\nbench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})\n```\n\n## Running Benchmarks\nNOTE: Partition is stateful per bench object. If partition is not called manually it will automatically be called on the first run only assuming a valid source file exists.\n### Without Column Selection\nRun benchmarks without column selection:\n\n```python\nbenchmarks_no_select = bench.run(suffix='_no_select')\n```\n\n### With Column Selection\nRun benchmarks with column selection:\n\n```python\ncolumns = ['Region', 'Country', 'Total Cost']\nbenchmarks_column_select = bench.run(columns=columns, suffix='_column_select')\n```\n\n## Generating Reports\nCombine results and generate the final report:\n\n```python\nall_benchmarks = benchmarks_no_select + benchmarks_column_select\nio_bench.report(all_benchmarks, report_dir='./result')\n```\n\n## Full Example\n\nHere is a full example of using `IOBench`:\n\n```python\nfrom io_bench import IOBench\n\ndef main() -\u003e None:\n    # Initialize the IOBench object with runs and parsers\n    bench = IOBench(source_file='./data/source_100K.csv', runs=20, parsers=['avro', 'parquet_polars'])\n\n    # Generate sample data - (optional)\n    bench.generate_sample()\n\n    # Convert the source file to partitioned formats - (optional)\n    bench.partition(rows={'avro': 500000, 'parquet': 3000000, 'feather': 1600000})\n\n    # Run benchmarks without column selection\n    benchmarks_no_select = bench.run(suffix='_no_select')\n\n    # Run benchmarks with column selection\n    columns = ['Region', 'Country', 'Total Cost']\n    benchmarks_column_select = bench.run(columns=columns, suffix='_column_select')\n\n    # Combine results and generate the final report\n    all_benchmarks = benchmarks_no_select + benchmarks_column_select\n    bench.report(all_benchmarks, report_dir='./result')\n\nif __name__ == \"__main__\":\n    main()\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faastopher%2Fio_bench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faastopher%2Fio_bench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faastopher%2Fio_bench/lists"}