{"id":50787439,"url":"https://github.com/OvertureMaps/osm-pbf-parquet","last_synced_at":"2026-06-26T14:00:45.435Z","repository":{"id":266892860,"uuid":"899598230","full_name":"OvertureMaps/osm-pbf-parquet","owner":"OvertureMaps","description":"Transcode OSM PBF file to parquet files with hive-style partitioning by type","archived":false,"fork":false,"pushed_at":"2026-06-11T13:00:13.000Z","size":1817,"stargazers_count":30,"open_issues_count":2,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-06-11T15:01:02.823Z","etag":null,"topics":["osm-pbf","overture","overture-maps","parquet","pbf","pbf-to-parquet"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OvertureMaps.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE-OF-CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-06T15:45:50.000Z","updated_at":"2026-06-11T13:00:13.000Z","dependencies_parsed_at":"2026-06-11T15:01:13.544Z","dependency_job_id":null,"html_url":"https://github.com/OvertureMaps/osm-pbf-parquet","commit_stats":null,"previous_names":["overturemaps/osm-pbf-parquet"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/OvertureMaps/osm-pbf-parquet","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OvertureMaps%2Fosm-pbf-parquet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OvertureMaps%2Fosm-pbf-parquet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OvertureMaps%2Fosm-pbf-parquet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OvertureMaps%2Fosm-pbf-parquet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OvertureMaps","download_url":"https://codeload.github.com/OvertureMaps/osm-pbf-parquet/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OvertureMaps%2Fosm-pbf-parquet/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34819597,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-26T02:00:06.560Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["osm-pbf","overture","overture-maps","parquet","pbf","pbf-to-parquet"],"created_at":"2026-06-12T09:00:23.111Z","updated_at":"2026-06-26T14:00:45.405Z","avatar_url":"https://github.com/OvertureMaps.png","language":"Rust","funding_links":[],"categories":["Rust","Tools and Libraries"],"sub_categories":["Official"],"readme":"# osm-pbf-parquet\nTranscode OSM PBF file to parquet files with hive-style partitioning by type\n\n## Getting started\n\n### Download\nDownload latest version from [releases](https://github.com/OvertureMaps/osm-pbf-parquet/releases)\n\n### Usage\nExample for x86_64 linux system with pre-compiled binary:\n```\ncurl -L \"https://github.com/OvertureMaps/osm-pbf-parquet/releases/latest/download/osm-pbf-parquet-x86_64-unknown-linux-gnu.tar.gz\" -o \"osm-pbf-parquet.tar.gz\"\ntar -xzf osm-pbf-parquet.tar.gz\nchmod +x osm-pbf-parquet\n./osm-pbf-parquet --input your.osm.pbf --output ./parquet\n```\n\nOR compile and run locally:\n```\ngit clone https://github.com/OvertureMaps/osm-pbf-parquet.git\ncargo run --release -- --input your.osm.pbf --output ./parquet\n```\n\n### Supported input/output\n- Local filesystem\n- AWS S3 (auth read from environment, see [object_store docs](https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3Builder.html))\n\n### Output structure\n```\nplanet.osm.pbf\nparquet/\n  type=node/\n    node_0000.zstd.parquet\n    ...\n  type=relation/\n    relation_0000.zstd.parquet\n    ...\n  type=way/\n    way_0000.zstd.parquet\n    ...\n```\n[Reference Arrow/SQL schema](https://github.com/OvertureMaps/osm-pbf-parquet/blob/main/src/osm_arrow.rs)\n\n### Querying\n\n#### DuckDB\n```\nduckdb -c \"SELECT * FROM read_parquet('s3://your-s3-bucket/path/') LIMIT 10;\"\n```\n\n#### Athena/Presto/Trino\n```\nCREATE EXTERNAL TABLE IF NOT EXISTS `osm` (\n    `id` BIGINT,\n    `tags` MAP\u003cSTRING, STRING\u003e,\n    `lat` DOUBLE,\n    `lon` DOUBLE,\n    `nds` ARRAY\u003cSTRUCT\u003cref: BIGINT\u003e\u003e,\n    `members` ARRAY\u003cSTRUCT\u003ctype: STRING, ref: BIGINT, role: STRING\u003e\u003e,\n    `changeset` BIGINT,\n    `timestamp` TIMESTAMP,\n    `uid` BIGINT,\n    `user` STRING,\n    `version` BIGINT,\n    `visible` BOOLEAN\n)\nPARTITIONED BY (\n    `type` STRING\n)\nROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'\nSTORED AS PARQUET\nLOCATION 's3://your-s3-bucket/path/';\n\nMSCK REPAIR TABLE `osm`;\n\nSELECT * FROM osm LIMIT 10;\n```\n\n## Development\n1. [Install rust](https://www.rust-lang.org/tools/install) and [just](https://github.com/casey/just)\n2. Clone repo `git clone https://github.com/OvertureMaps/osm-pbf-parquet.git`\n3. Make changes\n4. Run against PBF with `cargo run -- --input your.osm.pbf` ([Geofabrik regional PBF extracts here](https://download.geofabrik.de/))\n5. Run `just --list` to see available dev commands (`just test`, `just clippy`, `just ci-test`, etc.)\n\n\n## Benchmarks\nosm-pbf-parquet prioritizes transcode speed over file size, file count or perserving ordering. Here is a comparison against similar tools on the 2024-06-24 OSM planet PBF with target file size of 500MB:\n| | Time (wall) | Output size | File count |\n| - | - | - | - |\n| **osm-pbf-parquet** (zstd:3) | 30 minutes | 182GB | ~600 |\n| **osm-pbf-parquet** (zstd:9) | 60 minutes | 165GB | ~600 |\n| [osm-parquetizer](https://github.com/adrianulbona/osm-parquetizer) | 196 minutes | 285GB | 3 |\n| [osm2orc](https://github.com/mojodna/osm2orc) | 385 minutes | 110GB | 1 |\n\nTest system:\n```\ni5-9400 (6 CPU, 32GB memory)\nUbuntu 24.04\nOpenJDK 17\nRust 1.79.0\n```\n\n\n## License\nDistributed under the MIT License. See `LICENSE` for more information.\n\n## Acknowledgments\n* [osmpbf](https://github.com/b-r-u/osmpbf) and [osm2gzip](https://github.com/b-r-u/osm2gzip) for reading PBF data\n* [osm2orc](https://github.com/mojodna/osm2orc) for schema and processing ideas\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOvertureMaps%2Fosm-pbf-parquet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOvertureMaps%2Fosm-pbf-parquet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOvertureMaps%2Fosm-pbf-parquet/lists"}