https://github.com/OvertureMaps/osm-pbf-parquet
Transcode OSM PBF file to parquet files with hive-style partitioning by type
https://github.com/OvertureMaps/osm-pbf-parquet
osm-pbf overture overture-maps parquet pbf pbf-to-parquet
Last synced: about 23 hours ago
JSON representation
Transcode OSM PBF file to parquet files with hive-style partitioning by type
- Host: GitHub
- URL: https://github.com/OvertureMaps/osm-pbf-parquet
- Owner: OvertureMaps
- License: mit
- Created: 2024-12-06T15:45:50.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2026-06-11T13:00:13.000Z (16 days ago)
- Last Synced: 2026-06-11T15:01:02.823Z (16 days ago)
- Topics: osm-pbf, overture, overture-maps, parquet, pbf, pbf-to-parquet
- Language: Rust
- Homepage:
- Size: 1.73 MB
- Stars: 30
- Watchers: 4
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Code of conduct: CODE-OF-CONDUCT.md
- Codeowners: CODEOWNERS
Awesome Lists containing this project
- Awesome-Geospatial - osm-pbf-parquet - Transcode OSM PBF file to parquet files. (Rust)
- awesome-gers - osm-pbf-parquet - to-GERS conflation pipeline. (Tools and Libraries / Official)
README
# osm-pbf-parquet
Transcode OSM PBF file to parquet files with hive-style partitioning by type
## Getting started
### Download
Download latest version from [releases](https://github.com/OvertureMaps/osm-pbf-parquet/releases)
### Usage
Example for x86_64 linux system with pre-compiled binary:
```
curl -L "https://github.com/OvertureMaps/osm-pbf-parquet/releases/latest/download/osm-pbf-parquet-x86_64-unknown-linux-gnu.tar.gz" -o "osm-pbf-parquet.tar.gz"
tar -xzf osm-pbf-parquet.tar.gz
chmod +x osm-pbf-parquet
./osm-pbf-parquet --input your.osm.pbf --output ./parquet
```
OR compile and run locally:
```
git clone https://github.com/OvertureMaps/osm-pbf-parquet.git
cargo run --release -- --input your.osm.pbf --output ./parquet
```
### Supported input/output
- Local filesystem
- AWS S3 (auth read from environment, see [object_store docs](https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3Builder.html))
### Output structure
```
planet.osm.pbf
parquet/
type=node/
node_0000.zstd.parquet
...
type=relation/
relation_0000.zstd.parquet
...
type=way/
way_0000.zstd.parquet
...
```
[Reference Arrow/SQL schema](https://github.com/OvertureMaps/osm-pbf-parquet/blob/main/src/osm_arrow.rs)
### Querying
#### DuckDB
```
duckdb -c "SELECT * FROM read_parquet('s3://your-s3-bucket/path/') LIMIT 10;"
```
#### Athena/Presto/Trino
```
CREATE EXTERNAL TABLE IF NOT EXISTS `osm` (
`id` BIGINT,
`tags` MAP,
`lat` DOUBLE,
`lon` DOUBLE,
`nds` ARRAY>,
`members` ARRAY>,
`changeset` BIGINT,
`timestamp` TIMESTAMP,
`uid` BIGINT,
`user` STRING,
`version` BIGINT,
`visible` BOOLEAN
)
PARTITIONED BY (
`type` STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS PARQUET
LOCATION 's3://your-s3-bucket/path/';
MSCK REPAIR TABLE `osm`;
SELECT * FROM osm LIMIT 10;
```
## Development
1. [Install rust](https://www.rust-lang.org/tools/install) and [just](https://github.com/casey/just)
2. Clone repo `git clone https://github.com/OvertureMaps/osm-pbf-parquet.git`
3. Make changes
4. Run against PBF with `cargo run -- --input your.osm.pbf` ([Geofabrik regional PBF extracts here](https://download.geofabrik.de/))
5. Run `just --list` to see available dev commands (`just test`, `just clippy`, `just ci-test`, etc.)
## Benchmarks
osm-pbf-parquet prioritizes transcode speed over file size, file count or perserving ordering. Here is a comparison against similar tools on the 2024-06-24 OSM planet PBF with target file size of 500MB:
| | Time (wall) | Output size | File count |
| - | - | - | - |
| **osm-pbf-parquet** (zstd:3) | 30 minutes | 182GB | ~600 |
| **osm-pbf-parquet** (zstd:9) | 60 minutes | 165GB | ~600 |
| [osm-parquetizer](https://github.com/adrianulbona/osm-parquetizer) | 196 minutes | 285GB | 3 |
| [osm2orc](https://github.com/mojodna/osm2orc) | 385 minutes | 110GB | 1 |
Test system:
```
i5-9400 (6 CPU, 32GB memory)
Ubuntu 24.04
OpenJDK 17
Rust 1.79.0
```
## License
Distributed under the MIT License. See `LICENSE` for more information.
## Acknowledgments
* [osmpbf](https://github.com/b-r-u/osmpbf) and [osm2gzip](https://github.com/b-r-u/osm2gzip) for reading PBF data
* [osm2orc](https://github.com/mojodna/osm2orc) for schema and processing ideas