{"id":13906285,"url":"https://github.com/manojkarthick/pqrs","last_synced_at":"2025-05-16T14:06:39.028Z","repository":{"id":40327873,"uuid":"335521448","full_name":"manojkarthick/pqrs","owner":"manojkarthick","description":"Command line tool for inspecting Parquet files","archived":false,"fork":false,"pushed_at":"2024-08-19T09:22:29.000Z","size":193,"stargazers_count":326,"open_issues_count":17,"forks_count":31,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-04-19T16:12:21.367Z","etag":null,"topics":["arrow","parquet","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/manojkarthick.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-03T05:49:31.000Z","updated_at":"2025-04-19T08:16:22.000Z","dependencies_parsed_at":"2024-04-02T18:28:28.482Z","dependency_job_id":"8eac763e-4d3b-4774-a732-03ad50a471cb","html_url":"https://github.com/manojkarthick/pqrs","commit_stats":{"total_commits":70,"total_committers":9,"mean_commits":7.777777777777778,"dds":"0.24285714285714288","last_synced_commit":"1ce057ddefab2f1b12df086caa478c9ced6383e2"},"previous_names":[],"tags_count":9,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manojkarthick%2Fpqrs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manojkarthick%2Fpqrs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manojkarthick%2Fpqrs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manojkarthick%2Fpqrs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/manojkarthick","download_url":"https://codeload.github.com/manojkarthick/pqrs/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254544146,"owners_count":22088807,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arrow","parquet","rust"],"created_at":"2024-08-06T23:01:32.735Z","updated_at":"2025-05-16T14:06:38.977Z","avatar_url":"https://github.com/manojkarthick.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# pqrs ![build](https://github.com/manojkarthick/pqrs/workflows/build/badge.svg)\n\n* `pqrs` is a command line tool for inspecting [Parquet](https://parquet.apache.org/) files\n* This is a replacement for the [parquet-tools](https://github.com/apache/parquet-mr/tree/master/parquet-tools-deprecated) utility written in Rust\n* Built using the Rust implementation of [Parquet](https://github.com/apache/arrow-rs/tree/master/parquet) and [Arrow](https://github.com/apache/arrow-rs/tree/master/arrow)\n* `pqrs` roughly means \"parquet-tools in rust\"\n\n## Installation\n\n### Recommended Method\n\nYou can download release binaries [here](https://github.com/manojkarthick/pqrs/releases)\n\n### Alternative methods\n\n#### Using Homebrew\n\nFor macOS users, `pqrs` is available as a homebrew tap.\n\n```shell\nbrew install manojkarthick/tap/pqrs\n```\n\nNOTE: For users upgrading from v0.2 or prior, note that the location of the `pqrs` homebrew tap has been updated.\nTo update to v0.2.1+, please uninstall using `brew uninstall pqrs` and use the above command to re-install.\n\n#### Using cargo\n\n`pqrs` is also available for installation from [crates.io](https://crates.io/crates/pqrs) using `cargo`, the rust package manager.\n\n```shell\ncargo install pqrs\n```\n\n#### Building and running from source\n\nMake sure you have `rustc` and `cargo` installed on your machine.\n\n```shell\ngit clone https://github.com/manojkarthick/pqrs.git\ncargo build --release\n./target/release/pqrs\n```\n\n## Running\n\nThe below snippet shows the available subcommands:\n\n```shell\n❯ pqrs --help\npqrs 0.2.1\nManoj Karthick\nApache Parquet command-line utility\n\nUSAGE:\n    pqrs [FLAGS] [SUBCOMMAND]\n\nFLAGS:\n    -d, --debug      Show debug output\n    -h, --help       Prints help information\n    -V, --version    Prints version information\n\nSUBCOMMANDS:\n    cat         Prints the contents of Parquet file(s)\n    head        Prints the first n records of the Parquet file\n    help        Prints this message or the help of the given subcommand(s)\n    merge       Merge file(s) into another parquet file\n    rowcount    Prints the count of rows in Parquet file(s)\n    sample      Prints a random sample of records from the Parquet file\n    schema      Prints the schema of Parquet file(s)\n    size        Prints the size of Parquet file(s)\n```\n\n### Subcommand: cat\n\nPrints the contents of the given files and folders. Recursively traverses and prints all the files if the input is a directory.\nSupports json-like, json or CSV format. Use `--json` for JSON output, `--csv` for CSV output with column names in the first row, and `--csv-data-only` for CSV output without the column names row.\n\n```shell\n❯ pqrs cat data/cities.parquet\n{continent: \"Europe\", country: {name: \"France\", city: [\"Paris\", \"Nice\", \"Marseilles\", \"Cannes\"]}}\n{continent: \"Europe\", country: {name: \"Greece\", city: [\"Athens\", \"Piraeus\", \"Hania\", \"Heraklion\", \"Rethymnon\", \"Fira\"]}}\n{continent: \"North America\", country: {name: \"Canada\", city: [\"Toronto\", \"Vancouver\", \"St. John's\", \"Saint John\", \"Montreal\", \"Halifax\", \"Winnipeg\", \"Calgary\", \"Saskatoon\", \"Ottawa\", \"Yellowknife\"]}}\n```\n\n```shell\n❯ pqrs cat data/cities.parquet --json\n{\"continent\":\"Europe\",\"country\":{\"name\":\"France\",\"city\":[\"Paris\",\"Nice\",\"Marseilles\",\"Cannes\"]}}\n{\"continent\":\"Europe\",\"country\":{\"name\":\"Greece\",\"city\":[\"Athens\",\"Piraeus\",\"Hania\",\"Heraklion\",\"Rethymnon\",\"Fira\"]}}\n{\"continent\":\"North America\",\"country\":{\"name\":\"Canada\",\"city\":[\"Toronto\",\"Vancouver\",\"St. John's\",\"Saint John\",\"Montreal\",\"Halifax\",\"Winnipeg\",\"Calgary\",\"Saskatoon\",\"Ottawa\",\"Yellowknife\"]}}\n```\n\n```shell\n❯ pqrs cat data/simple.parquet --csv\nfoo,bar\n1,2\n10,20\n```\n\n```shell\n❯ pqrs cat data/simple.parquet --csv --no-header\n1,2\n10,20\n```\n\nNOTE: CSV format is not supported for files that contain Struct or Byte fields.\n\n### Subcommand: head\n\nPrints the first N records of the parquet file. Use `--records` flag to set the number of records.\n\n```shell\n❯ pqrs head data/cities.parquet --json --records 2\n{\"continent\":\"Europe\",\"country\":{\"name\":\"France\",\"city\":[\"Paris\",\"Nice\",\"Marseilles\",\"Cannes\"]}}\n{\"continent\":\"Europe\",\"country\":{\"name\":\"Greece\",\"city\":[\"Athens\",\"Piraeus\",\"Hania\",\"Heraklion\",\"Rethymnon\",\"Fira\"]}}\n```\n\n### Subcommand: merge\n\nMerge two Parquet files by placing row groups (or blocks) from the two files one after the other.\n\nDisclaimer: This does not combine the files to have optimized row groups, do not use it in production!\n\n```shell\n❯ pqrs merge --input data/pems-1.snappy.parquet data/pems-2.snappy.parquet --output data/pems-merged.snappy.parquet\n\n❯ ls -al data\ntotal 408\ndrwxr-xr-x   6 manojkarthick  staff     192 Feb 14 08:53 .\ndrwxr-xr-x  20 manojkarthick  staff     640 Feb 14 08:52 ..\n-rw-r--r--   1 manojkarthick  staff     866 Feb  8 19:50 cities.parquet\n-rw-r--r--   1 manojkarthick  staff   16468 Feb  8 19:50 pems-1.snappy.parquet\n-rw-r--r--   1 manojkarthick  staff   17342 Feb  8 19:50 pems-2.snappy.parquet\n-rw-r--r--   1 manojkarthick  staff  160950 Feb 14 08:53 pems-merged.snappy.parquet\n```\n\n### Subcommand: rowcount\n\nPrint the number of rows present in the parquet file.\n\n```shell\n❯ pqrs row-count data/pems-1.snappy.parquet data/pems-2.snappy.parquet\nFile Name: data/pems-1.snappy.parquet: 2693 rows\nFile Name: data/pems-2.snappy.parquet: 2880 rows\n```\n\n### Subcommand: sample\n\nPrints a random sample of records from the given parquet file.\n\n```shell\n❯ pqrs sample data/pems-1.snappy.parquet --records 3\n{timeperiod: \"01/17/2016 07:01:27\", flow1: 0, occupancy1: 0E0, speed1: 0E0, flow2: 0, occupancy2: 0E0, speed2: 0E0, flow3: 0, occupancy3: 0E0, speed3: 0E0, flow4: null, occupancy4: null, speed4: null, flow5: null, occupancy5: null, speed5: null, flow6: null, occupancy6: null, speed6: null, flow7: null, occupancy7: null, speed7: null, flow8: null, occupancy8: null, speed8: null}\n{timeperiod: \"01/17/2016 07:47:27\", flow1: 0, occupancy1: 0E0, speed1: 0E0, flow2: 0, occupancy2: 0E0, speed2: 0E0, flow3: 0, occupancy3: 0E0, speed3: 0E0, flow4: null, occupancy4: null, speed4: null, flow5: null, occupancy5: null, speed5: null, flow6: null, occupancy6: null, speed6: null, flow7: null, occupancy7: null, speed7: null, flow8: null, occupancy8: null, speed8: null}\n{timeperiod: \"01/17/2016 09:44:27\", flow1: 0, occupancy1: 0E0, speed1: 0E0, flow2: 0, occupancy2: 0E0, speed2: 0E0, flow3: 0, occupancy3: 0E0, speed3: 0E0, flow4: null, occupancy4: null, speed4: null, flow5: null, occupancy5: null, speed5: null, flow6: null, occupancy6: null, speed6: null, flow7: null, occupancy7: null, speed7: null, flow8: null, occupancy8: null, speed8: null}\n```\n\n### Subcommand: schema\n\nPrint the schema from the given parquet file. Use the `--detailed` flag to get more detailed stats.\n\n```shell\n❯ pqrs schema data/cities.parquet\nMetadata for file: data/cities.parquet\n\nversion: 1\nnum of rows: 3\ncreated by: parquet-mr version 1.5.0-cdh5.7.0 (build ${buildNumber})\nmessage hive_schema {\n  OPTIONAL BYTE_ARRAY continent (UTF8);\n  OPTIONAL group country {\n    OPTIONAL BYTE_ARRAY name (UTF8);\n    OPTIONAL group city (LIST) {\n      REPEATED group bag {\n        OPTIONAL BYTE_ARRAY array_element (UTF8);\n      }\n    }\n  }\n}\n```\n\n```shell\n❯ pqrs schema data/cities.parquet --detailed\n\nnum of row groups: 1\nrow groups:\n\nrow group 0:\n--------------------------------------------------------------------------------\ntotal byte size: 466\nnum of rows: 3\n\nnum of columns: 3\ncolumns:\n\ncolumn 0:\n--------------------------------------------------------------------------------\ncolumn type: BYTE_ARRAY\ncolumn path: \"continent\"\nencodings: BIT_PACKED PLAIN_DICTIONARY RLE\nfile path: N/A\nfile offset: 4\nnum of values: 3\ntotal compressed size (in bytes): 93\ntotal uncompressed size (in bytes): 93\ndata page offset: 4\nindex page offset: N/A\ndictionary page offset: N/A\nstatistics: {min: [69, 117, 114, 111, 112, 101], max: [78, 111, 114, 116, 104, 32, 65, 109, 101, 114, 105, 99, 97], distinct_count: N/A, null_count: 0, min_max_deprecated: true}\n\n\u003c....output clipped\u003e\n\n```\n\n```shell\n❯ pqrs schema --json data/cities.parquet\n{\"version\":1,\"num_rows\":3,\"created_by\":\"parquet-mr version 1.5.0-cdh5.7.0 (build ${buildNumber})\",\"metadata\":null,\"columns\":[{\"optional\":\"true\",\"physical_type\":\"BYTE_ARRAY\",\"name\":\"continent\",\"path\":\"continent\",\"converted_type\":\"UTF8\"},{\"name\":\"name\",\"converted_type\":\"UTF8\",\"path\":\"country.name\",\"physical_type\":\"BYTE_ARRAY\",\"optional\":\"true\"},{\"optional\":\"true\",\"name\":\"array_element\",\"physical_type\":\"BYTE_ARRAY\",\"path\":\"country.city.bag.array_element\",\"converted_type\":\"UTF8\"}],\"message\":\"message hive_schema {\\n  OPTIONAL BYTE_ARRAY continent (UTF8);\\n  OPTIONAL group country {\\n    OPTIONAL BYTE_ARRAY name (UTF8);\\n    OPTIONAL group city (LIST) {\\n      REPEATED group bag {\\n        OPTIONAL BYTE_ARRAY array_element (UTF8);\\n      }\\n    }\\n  }\\n}\\n\"}\n\n```\n\n### Subcommand: size\n\nPrint the compressed/uncompressed size of the parquet file. Shows uncompressed size by default\n\n```shell\n❯ pqrs size data/pems-1.snappy.parquet --pretty\nSize in Bytes:\n\nFile Name: data/pems-1.snappy.parquet\nUncompressed Size: 61 KiB\n```\n\n```shell\n❯ pqrs size data/pems-1.snappy.parquet --pretty --compressed\nSize in Bytes:\n\nFile Name: data/pems-1.snappy.parquet\nCompressed Size: 12 KiB\n```\n\n### TODO\n\n* [ ] Test on Windows\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanojkarthick%2Fpqrs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanojkarthick%2Fpqrs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanojkarthick%2Fpqrs/lists"}