https://github.com/OvertureMaps/osm-pbf-parquet

Transcode OSM PBF file to parquet files with hive-style partitioning by type
https://github.com/OvertureMaps/osm-pbf-parquet

osm-pbf overture overture-maps parquet pbf pbf-to-parquet

Last synced: 29 days ago
JSON representation

Transcode OSM PBF file to parquet files with hive-style partitioning by type

Host: GitHub
URL: https://github.com/OvertureMaps/osm-pbf-parquet
Owner: OvertureMaps
License: mit
Created: 2024-12-06T15:45:50.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2026-06-11T13:00:13.000Z (about 1 month ago)
Last Synced: 2026-06-11T15:01:02.823Z (about 1 month ago)
Topics: osm-pbf, overture, overture-maps, parquet, pbf, pbf-to-parquet
Language: Rust
Homepage:
Size: 1.73 MB
Stars: 30
Watchers: 4
Forks: 2
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
- Code of conduct: CODE-OF-CONDUCT.md
- Codeowners: CODEOWNERS

Awesome Lists containing this project

Awesome-Geospatial - osm-pbf-parquet - Transcode OSM PBF file to parquet files. (Rust)
awesome-gers - osm-pbf-parquet - to-GERS conflation pipeline. (Tools and Libraries / Official)

README

          # osm-pbf-parquet

Transcode OSM PBF file to parquet files with hive-style partitioning by type

## Getting started

### Download

Download latest version from [releases](https://github.com/OvertureMaps/osm-pbf-parquet/releases)

### Usage

Example for x86_64 linux system with pre-compiled binary:

```

curl -L "https://github.com/OvertureMaps/osm-pbf-parquet/releases/latest/download/osm-pbf-parquet-x86_64-unknown-linux-gnu.tar.gz" -o "osm-pbf-parquet.tar.gz"

tar -xzf osm-pbf-parquet.tar.gz

chmod +x osm-pbf-parquet

./osm-pbf-parquet --input your.osm.pbf --output ./parquet

```

OR compile and run locally:

```

git clone https://github.com/OvertureMaps/osm-pbf-parquet.git

cargo run --release -- --input your.osm.pbf --output ./parquet

```

### Supported input/output

- Local filesystem

- AWS S3 (auth read from environment, see [object_store docs](https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3Builder.html))

### Output structure

```

planet.osm.pbf

parquet/

  type=node/

    node_0000.zstd.parquet

    ...

  type=relation/

    relation_0000.zstd.parquet

    ...

  type=way/

    way_0000.zstd.parquet

    ...

```

[Reference Arrow/SQL schema](https://github.com/OvertureMaps/osm-pbf-parquet/blob/main/src/osm_arrow.rs)

### Querying

#### DuckDB

```

duckdb -c "SELECT * FROM read_parquet('s3://your-s3-bucket/path/') LIMIT 10;"

```

#### Athena/Presto/Trino

```

CREATE EXTERNAL TABLE IF NOT EXISTS `osm` (

    `id` BIGINT,

    `tags` MAP,

    `lat` DOUBLE,

    `lon` DOUBLE,

    `nds` ARRAY>,

    `members` ARRAY>,

    `changeset` BIGINT,

    `timestamp` TIMESTAMP,

    `uid` BIGINT,

    `user` STRING,

    `version` BIGINT,

    `visible` BOOLEAN

)

PARTITIONED BY (

    `type` STRING

)

ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'

STORED AS PARQUET

LOCATION 's3://your-s3-bucket/path/';

MSCK REPAIR TABLE `osm`;

SELECT * FROM osm LIMIT 10;

```

## Development

1. [Install rust](https://www.rust-lang.org/tools/install) and [just](https://github.com/casey/just)

2. Clone repo `git clone https://github.com/OvertureMaps/osm-pbf-parquet.git`

3. Make changes

4. Run against PBF with `cargo run -- --input your.osm.pbf` ([Geofabrik regional PBF extracts here](https://download.geofabrik.de/))

5. Run `just --list` to see available dev commands (`just test`, `just clippy`, `just ci-test`, etc.)

## Benchmarks

osm-pbf-parquet prioritizes transcode speed over file size, file count or perserving ordering. Here is a comparison against similar tools on the 2024-06-24 OSM planet PBF with target file size of 500MB:

| | Time (wall) | Output size | File count |

| - | - | - | - |

| **osm-pbf-parquet** (zstd:3) | 30 minutes | 182GB | ~600 |

| **osm-pbf-parquet** (zstd:9) | 60 minutes | 165GB | ~600 |

| [osm-parquetizer](https://github.com/adrianulbona/osm-parquetizer) | 196 minutes | 285GB | 3 |

| [osm2orc](https://github.com/mojodna/osm2orc) | 385 minutes | 110GB | 1 |

Test system:

```

i5-9400 (6 CPU, 32GB memory)

Ubuntu 24.04

OpenJDK 17

Rust 1.79.0

```

## License

Distributed under the MIT License. See `LICENSE` for more information.

## Acknowledgments

* [osmpbf](https://github.com/b-r-u/osmpbf) and [osm2gzip](https://github.com/b-r-u/osm2gzip) for reading PBF data

* [osm2orc](https://github.com/mojodna/osm2orc) for schema and processing ideas

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/OvertureMaps/osm-pbf-parquet

Awesome Lists containing this project

README