{"id":13737393,"url":"https://github.com/ktrueda/parquet-tools","last_synced_at":"2025-10-21T19:13:39.899Z","repository":{"id":37938306,"uuid":"260630546","full_name":"ktrueda/parquet-tools","owner":"ktrueda","description":"easy install parquet-tools","archived":false,"fork":false,"pushed_at":"2024-07-09T19:15:45.000Z","size":535,"stargazers_count":176,"open_issues_count":20,"forks_count":24,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-21T17:45:20.848Z","etag":null,"topics":["cli","parquet","parquet-tools"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ktrueda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-02T06:46:50.000Z","updated_at":"2025-04-16T13:08:42.000Z","dependencies_parsed_at":"2024-05-16T14:31:12.857Z","dependency_job_id":"fcc4667f-7e9a-46c5-80fb-8fe5c9131112","html_url":"https://github.com/ktrueda/parquet-tools","commit_stats":{"total_commits":51,"total_committers":13,"mean_commits":3.923076923076923,"dds":"0.33333333333333337","last_synced_commit":"4239905a5d5013197d6964bfab6b57d6ec41caca"},"previous_names":[],"tags_count":20,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktrueda%2Fparquet-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktrueda%2Fparquet-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktrueda%2Fparquet-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ktrueda%2Fparquet-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ktrueda","download_url":"https://codeload.github.com/ktrueda/parquet-tools/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253077669,"owners_count":21850362,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","parquet","parquet-tools"],"created_at":"2024-08-03T03:01:46.216Z","updated_at":"2025-10-21T19:13:39.894Z","avatar_url":"https://github.com/ktrueda.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# parquet-tools\n\n![Run Unittest](https://github.com/ktrueda/parquet-tools/workflows/Run%20Unittest/badge.svg)\n![Run CLI test](https://github.com/ktrueda/parquet-tools/workflows/Run%20CLI%20test/badge.svg)\n\nThis is a pip installable [parquet-tools](https://github.com/apache/parquet-mr).\nIn other words, parquet-tools is a CLI tools of [Apache Arrow](https://github.com/apache/arrow).\nYou can show parquet file content/schema on local disk or on Amazon S3.\nIt is incompatible with original parquet-tools.\n\n## Features\n\n- Read Parquet data (local file or file on S3)\n- Read Parquet metadata/schema (local file or file on S3)\n\n## Installation\n\n```bash\n$ pip install parquet-tools\n```\n\n## Usage\n\n```bash\n$ parquet-tools --help\nusage: parquet-tools [-h] {show,csv,inspect} ...\n\nparquet CLI tools\n\npositional arguments:\n  {show,csv,inspect}\n    show              Show human readble format. see `show -h`\n    csv               Cat csv style. see `csv -h`\n    inspect           Inspect parquet file. see `inspect -h`\n\noptional arguments:\n  -h, --help          show this help message and exit\n```\n\n## Usage Examples\n\n#### Show local parquet file\n\n```bash\n$ parquet-tools show test.parquet\n+-------+-------+---------+\n|   one | two   | three   |\n|-------+-------+---------|\n|  -1   | foo   | True    |\n| nan   | bar   | False   |\n|   2.5 | baz   | True    |\n+-------+-------+---------+\n```\n\n#### Show parquet file on S3\n\n```bash\n$ parquet-tools show s3://bucket-name/prefix/*\n+-------+-------+---------+\n|   one | two   | three   |\n|-------+-------+---------|\n|  -1   | foo   | True    |\n| nan   | bar   | False   |\n|   2.5 | baz   | True    |\n+-------+-------+---------+\n```\n\n\n#### Inspect parquet file schema\n\n```bash\n$ parquet-tools inspect /path/to/parquet\n```\n\n\u003cdetails\u003e\n\n\u003csummary\u003eInspect output\u003c/summary\u003e\n\n```\n############ file meta data ############\ncreated_by: parquet-cpp version 1.5.1-SNAPSHOT\nnum_columns: 3\nnum_rows: 3\nnum_row_groups: 1\nformat_version: 1.0\nserialized_size: 2226\n\n\n############ Columns ############\none\ntwo\nthree\n\n############ Column(one) ############\nname: one\npath: one\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: DOUBLE\nlogical_type: None\nconverted_type (legacy): NONE\n\n############ Column(two) ############\nname: two\npath: two\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: BYTE_ARRAY\nlogical_type: String\nconverted_type (legacy): UTF8\n\n############ Column(three) ############\nname: three\npath: three\nmax_definition_level: 1\nmax_repetition_level: 0\nphysical_type: BOOLEAN\nlogical_type: None\nconverted_type (legacy): NONE\n```\n\u003c/details\u003e\n\n#### Cat CSV parquet and transform [csvq](https://github.com/mithrandie/csvq)\n\n```bash\n$ parquet-tools csv s3://bucket-name/test.parquet |csvq \"select one, three where three\"\n+-------+-------+\n|  one  | three |\n+-------+-------+\n| -1.0  | True  |\n| 2.5   | True  |\n+-------+-------+\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fktrueda%2Fparquet-tools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fktrueda%2Fparquet-tools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fktrueda%2Fparquet-tools/lists"}