Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/adrianulbona/osm-parquetizer
A converter for the OSM PBFs to Parquet files
https://github.com/adrianulbona/osm-parquetizer
apache-spark converter openstreetmap parquet-files pbf
Last synced: 2 months ago
JSON representation
A converter for the OSM PBFs to Parquet files
- Host: GitHub
- URL: https://github.com/adrianulbona/osm-parquetizer
- Owner: adrianulbona
- License: apache-2.0
- Created: 2016-04-03T11:05:50.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-09-01T14:22:05.000Z (over 1 year ago)
- Last Synced: 2024-09-29T07:03:40.274Z (3 months ago)
- Topics: apache-spark, converter, openstreetmap, parquet-files, pbf
- Language: Java
- Homepage: http://adrianulbona.github.io/2016/12/18/osm-parquetizer.html
- Size: 75.2 KB
- Stars: 90
- Watchers: 7
- Forks: 32
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## OpenStreetMap Parquetizer
[![Build Status](https://travis-ci.org/adrianulbona/hmm.svg)](https://travis-ci.org/adrianulbona/osm-parquetizer)
The project intends to provide a way to get the [OpenStreetMap](https://www.openstreetmap.org) data available in a Big Data friendly format as [Parquet](https://parquet.apache.org/).
Currently any [PBF](http://wiki.openstreetmap.org/wiki/PBF_Format) file is converted into three parquet files, one for each type of entity from the original PBF (Nodes, Ways and Relations).
In order to get started:
```shell
git clone https://github.com/adrianulbona/osm-parquetizer.git
cd osm-parquetizer
mvn clean package
java -jar target/osm-parquetizer-1.0.1-SNAPSHOT.jar path_to_your.pbf
```For example, by running:
```shell
java -jar target/osm-parquetizer-1.0.1-SNAPSHOT.jar romania-latest.osm.pbf
```In a few seconds (on a decent laptop) you should get the following files:
```shell
-rw-r--r-- 1 adrianbona adrianbona 145M Apr 3 19:57 romania-latest.osm.pbf
-rw-r--r-- 1 adrianbona adrianbona 372M Apr 3 19:58 romania-latest.osm.pbf.node.parquet
-rw-r--r-- 1 adrianbona adrianbona 1.1M Apr 3 19:58 romania-latest.osm.pbf.relation.parquet
-rw-r--r-- 1 adrianbona adrianbona 123M Apr 3 19:58 romania-latest.osm.pbf.way.parquet
```The parquet files have the following schemas:
```probobuf
node
|-- id: long
|-- version: integer
|-- timestamp: long
|-- changeset: long
|-- uid: integer
|-- user_sid: string
|-- tags: array
| |-- element: struct
| | |-- key: string
| | |-- value: string
|-- latitude: double
|-- longitude: doubleway
|-- id: long
|-- version: integer
|-- timestamp: long
|-- changeset: long
|-- uid: integer
|-- user_sid: string
|-- tags: array
| |-- element: struct
| | |-- key: string
| | |-- value: string
|-- nodes: array
| |-- element: struct
| | |-- index: integer
| | |-- nodeId: longrelation
|-- id: long
|-- version: integer
|-- timestamp: long
|-- changeset: long
|-- uid: integer
|-- user_sid: string
|-- tags: array
| |-- element: struct
| | |-- key: string
| | |-- value: string
|-- members: array
| |-- element: struct
| | |-- id: long
| | |-- role: string
| | |-- type: string
```