{"id":14982396,"url":"https://github.com/adrianulbona/osm-parquetizer","last_synced_at":"2025-10-29T12:31:35.150Z","repository":{"id":86507827,"uuid":"55342414","full_name":"adrianulbona/osm-parquetizer","owner":"adrianulbona","description":"A converter for the OSM PBFs to Parquet files  ","archived":false,"fork":false,"pushed_at":"2023-09-01T14:22:05.000Z","size":77,"stargazers_count":93,"open_issues_count":5,"forks_count":32,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-02-02T01:32:03.829Z","etag":null,"topics":["apache-spark","converter","openstreetmap","parquet-files","pbf"],"latest_commit_sha":null,"homepage":"http://adrianulbona.github.io/2016/12/18/osm-parquetizer.html","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adrianulbona.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-04-03T11:05:50.000Z","updated_at":"2025-01-29T12:55:42.000Z","dependencies_parsed_at":null,"dependency_job_id":"575e48cf-766e-4bb5-863b-721da6c19957","html_url":"https://github.com/adrianulbona/osm-parquetizer","commit_stats":{"total_commits":29,"total_committers":5,"mean_commits":5.8,"dds":0.2068965517241379,"last_synced_commit":"baf32566e8af8e2e5f843464c02894b4f345e2c5"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrianulbona%2Fosm-parquetizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrianulbona%2Fosm-parquetizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrianulbona%2Fosm-parquetizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adrianulbona%2Fosm-parquetizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adrianulbona","download_url":"https://codeload.github.com/adrianulbona/osm-parquetizer/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238825740,"owners_count":19537118,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","converter","openstreetmap","parquet-files","pbf"],"created_at":"2024-09-24T14:05:20.490Z","updated_at":"2025-10-29T12:31:34.856Z","avatar_url":"https://github.com/adrianulbona.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"## OpenStreetMap Parquetizer\n\n[![Build Status](https://travis-ci.org/adrianulbona/hmm.svg)](https://travis-ci.org/adrianulbona/osm-parquetizer)\n\nThe project intends to provide a way to get the [OpenStreetMap](https://www.openstreetmap.org) data available in a Big Data friendly format as [Parquet](https://parquet.apache.org/).\n\nCurrently any [PBF](http://wiki.openstreetmap.org/wiki/PBF_Format) file is converted into three parquet files, one for each type of entity from the original PBF (Nodes, Ways and Relations).\n\nIn order to get started: \n\n```shell\ngit clone https://github.com/adrianulbona/osm-parquetizer.git\ncd osm-parquetizer\nmvn clean package\njava -jar target/osm-parquetizer-1.0.1-SNAPSHOT.jar path_to_your.pbf\n```\n\nFor example, by running: \n\n```shell\njava -jar target/osm-parquetizer-1.0.1-SNAPSHOT.jar romania-latest.osm.pbf\n```\n\nIn a few seconds (on a decent laptop) you should get the following files:\n```shell\n-rw-r--r--  1 adrianbona  adrianbona   145M Apr  3 19:57 romania-latest.osm.pbf\n-rw-r--r--  1 adrianbona  adrianbona   372M Apr  3 19:58 romania-latest.osm.pbf.node.parquet\n-rw-r--r--  1 adrianbona  adrianbona   1.1M Apr  3 19:58 romania-latest.osm.pbf.relation.parquet\n-rw-r--r--  1 adrianbona  adrianbona   123M Apr  3 19:58 romania-latest.osm.pbf.way.parquet\n```\n\nThe parquet files have the following schemas:\n\n```probobuf\nnode\n |-- id: long\n |-- version: integer\n |-- timestamp: long\n |-- changeset: long\n |-- uid: integer\n |-- user_sid: string\n |-- tags: array\n |    |-- element: struct\n |    |    |-- key: string\n |    |    |-- value: string\n |-- latitude: double\n |-- longitude: double\n\nway\n |-- id: long\n |-- version: integer\n |-- timestamp: long\n |-- changeset: long\n |-- uid: integer\n |-- user_sid: string\n |-- tags: array\n |    |-- element: struct\n |    |    |-- key: string\n |    |    |-- value: string\n |-- nodes: array\n |    |-- element: struct\n |    |    |-- index: integer\n |    |    |-- nodeId: long\n\nrelation\n |-- id: long\n |-- version: integer\n |-- timestamp: long\n |-- changeset: long\n |-- uid: integer\n |-- user_sid: string\n |-- tags: array\n |    |-- element: struct\n |    |    |-- key: string\n |    |    |-- value: string\n |-- members: array\n |    |-- element: struct\n |    |    |-- id: long\n |    |    |-- role: string\n |    |    |-- type: string\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrianulbona%2Fosm-parquetizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadrianulbona%2Fosm-parquetizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadrianulbona%2Fosm-parquetizer/lists"}