{"id":15202894,"url":"https://github.com/geotrellis/vectorpipe","last_synced_at":"2025-10-28T23:31:25.175Z","repository":{"id":43611897,"uuid":"71492746","full_name":"geotrellis/vectorpipe","owner":"geotrellis","description":"Convert Vector data to VectorTiles with GeoTrellis.","archived":false,"fork":false,"pushed_at":"2021-12-21T18:51:30.000Z","size":14104,"stargazers_count":75,"open_issues_count":22,"forks_count":20,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-02-01T19:39:22.514Z","etag":null,"topics":["geotrellis","openstreetmap","vector-tiles"],"latest_commit_sha":null,"homepage":"https://geotrellis.github.io/vectorpipe/","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/geotrellis.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.org","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-10-20T18:35:16.000Z","updated_at":"2025-01-18T12:28:10.000Z","dependencies_parsed_at":"2022-09-10T12:22:35.227Z","dependency_job_id":null,"html_url":"https://github.com/geotrellis/vectorpipe","commit_stats":null,"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geotrellis%2Fvectorpipe","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geotrellis%2Fvectorpipe/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geotrellis%2Fvectorpipe/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/geotrellis%2Fvectorpipe/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/geotrellis","download_url":"https://codeload.github.com/geotrellis/vectorpipe/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238738028,"owners_count":19522298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["geotrellis","openstreetmap","vector-tiles"],"created_at":"2024-09-28T04:07:19.367Z","updated_at":"2025-10-28T23:31:24.650Z","avatar_url":"https://github.com/geotrellis.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VectorPipe #\n\n[![CircleCI](https://circleci.com/gh/geotrellis/vectorpipe/tree/master.svg?style=svg)](https://circleci.com/gh/geotrellis/vectorpipe/tree/master)\n[![SonaType Releases](https://img.shields.io/nexus/r/com.azavea.geotrellis/vectorpipe_2.11?label=SonaType%20Nexus\u0026logo=vectorpipe\u0026server=https%3A%2F%2Foss.sonatype.org)](https://oss.sonatype.org/#nexus-search;quick~vectorpipe)\n[![Codacy Badge](https://api.codacy.com/project/badge/Grade/447170921bc94b3fb494bb2b965c2235)](https://www.codacy.com/app/fosskers/vectorpipe?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=geotrellis/vectorpipe\u0026amp;utm_campaign=Badge_Grade)\n\nVectorPipe (VP) is a library for working with OpenStreetMap (OSM) vector\ndata and writing geometries to vector tile layers. Powered by [Geotrellis](http://geotrellis.io)\nand [Apache Spark](http://spark.apache.org/).\n\nOSM provides a wealth of data which has broad coverage and a deep history.\nThis comes at the price of very large size which can make accessing the power\nof OSM difficult.  VectorPipe can help by making OSM processing in Apache\nSpark possible, leveraging large computing clusters to churn through the large\nvolume of, say, an OSM full history file.\n\nFor those cases where an application needs to process incoming changes, VP\nalso provides streaming Spark `DataSource`s for changesets, OsmChange files,\nand Augmented diffs generated by Overpass.\n\nFor ease of use, the output of VP imports is a Spark DataFrame containing\ncolumns of JTS `Geometry` objects, enabled by the user-defined types provided\nby [GeoMesa](https://github.com/locationtech/geomesa).  That package also\nprovides functions for manipulating those geometries via Spark SQL directives.\n\nThe final important contribution is a set of functions for exporting\ngeometries to vector tiles.  This leans on the `geotrellis-vectortile`\npackage.\n\n## Getting Started ##\n\nAdd the following to your `build.sbt`:\n```\nlibraryDependencies += \"com.azavea.geotrellis\" %% \"vectorpipe\" % \"2.2.0\"\n```\n\n**Note:** VectorPipe releases for version 2.0.0+ are hosted on SonaType. If you need earlier releases, they can be found on [Bintray](https://bintray.com/azavea/maven/vectorpipe). If using SBT for older releases, you will also need to include `resolvers ++= Resolver.bintrayRepo(\"azavea\", \"maven\")` in your `build.sbt`.\n\n### With a REPL\n\nThe fastest way to get started with VectorPipe in a REPL is to invoke `spark-shell`:\n```bash\nspark-shell --packages com.azavea.geotrellis:vectorpipe_2.11:2.2.0\n```\n\nThis will download the required components and set up a REPL with VectorPipe\navailable.  At which point, you may issue\n```scala\n// Make JTS types available to Spark\nimport org.locationtech.geomesa.spark.jts._\nspark.withJTS\n\nimport vectorpipe._\n```\nand begin using the package.\n\n#### A Note on Cluster Computing ####\n\nYour local machine is probably insufficient for dealing with very large OSM\nfiles.  We recommend the use of Amazon's Elastic Map Reduce (EMR) service to\nprovision substantial clusters of computing resources.  You'll want to supply\nSpark, Hive, and Hadoop to your cluster, with Spark version 2.3.  Creating a\ncluster with EMR version between 5.13 and 5.19 should suffice.  From there,\n`ssh` into the master node and run `spark-shell` as above for an interactive\nenvironment, or use `spark-submit` for batch jobs.  (You may submit Steps to\nthe EMR cluster using `spark-submit` as well.)\n\n### Importing Data ###\n\nBatch analysis can be performed in a few different ways.  Perhaps the fastest\nway is to procure an OSM PBF file from a source such as\n[GeoFabrik](https://download.geofabrik.de/index.html), which supplies various\nextracts of OSM, including the full planet worth of data.\n\nVectorPipe does not provide the means to directly read these OSM PBF files,\nhowever, and a conversion to a useful file format will thus be needed.  We\nsuggest using [`osm2orc`](https://github.com/mojodna/osm2orc) to convert your\nsource file to the ORC format which can be read natively via Spark:\n```scala\nval df = spark.read.orc(path)\n```\nThe resulting `DataFrame` can be processed with VectorPipe.\n\nIt is also possible to read from a cache of\n[OsmChange](https://wiki.openstreetmap.org/wiki/OsmChange) files directly\nrather than convert the PBF file:\n```scala\nimport vectorpipe.sources.Source\nval df = spark.read\n              .format(Source.Changes)\n              .options(Map[String, String](\n                Source.BaseURI -\u003e \"https://download.geofabrik.de/europe/isle-of-man-updates/\",\n                Source.StartSequence -\u003e \"2080\",\n                Source.EndSequence -\u003e \"2174\",\n                Source.BatchSize -\u003e \"1\"))\n              .load\n              .persist // recommended to avoid rereading\n```\n(Note that the start and end sequence will shift over time for Geofabrik.\nPlease navigate to the base URI to determine these values, otherwise timeouts\nmay occur.)  This may issue errors, but should complete.  This is much slower\nthan using ORC files and is much touchier, but it stands as an option.\n\n[It is also possible to build a dataframe from a stream of changesets in a\nsimilar manner as above.  Changesets carry additional metadata regarding the\nauthor of the changes, but none of the geometric information.  These tables\ncan be joined on `changeset`.]\n\nIn either case, a useful place to start is to convert the incoming dataframe\ninto a more usable format.  We recommend calling\n```scala\nval geoms = OSM.toGeometry(df)\n```\nwhich will produce a frame consisting of \"top-level\" entities, which is to say\nnodes that don't participate in a way, ways that don't participate in\nrelations, and a subset of the relations from the OSM data.  The resulting\ndataframe will represent these entities with JTS geometries in the `geom`\ncolumn.\n\nThe `toGeometry` function keeps elements that fit one of the following\ndescriptions:\n- points from tagged nodes (including tags that really ought to be dropped—e.g. `source=*`);\n- polygons derived from ways with tags that cause them to be considered as areas;\n- lines from ways lacking area tags;\n- multipolygons from multipolygon or boundary relations; and\n- multilinestrings from route relations.\n\nIt is also possible to filter the results based on information in the tags.\nFor instance, all buildings can be found as\n```scala\nimport vectorpipe.functions.osm._\nval buildings = geoms.filter(isBuilding('tags))\n```\n\nAgain, the JTS user defined types allow for easier manipulation of and\ncalculation from geometric types.  See\n[here](https://www.geomesa.org/documentation/user/spark/sparksql_functions.html)\nfor a list of functions that operate on geometries.\n\n#### A Note on Geocoding ####\n\nVectorPipe provides the means to tag geometries with the country codes of the\ncountries they interact with, but it does not provide the boundaries used to\ndo the coding.  That gives the user the option to select geometries\nappropriate to the task at hand—low resolution geometries for less fussy\napplications, high resolution when precision is important.\n\nIn order for an application to make use of `vectorpipe.util.Geocode`, it must\nsupply a `countries.geojson` in in the root of its project's `resources`\ndirectory.  That GeoJSON file must contain a `FeatureCollection`, with each\nentry having an `ADM0_A3` entry in its `properties` list.\n\nOne may employ the [Natural Earth Admin\n0](https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-boundary-lines/)\nresource for low-precision tasks, or use something like the [Global LSIB\nPolygons](http://geonode.state.gov/layers/geonode%3AGlobal_LSIB_Polygons_Detailed)\nfor more precise tasks (though the latter resource does not tag its elements\nwith the `ADM0_A3` three-letter codes, so some preprocessing would be required).\n\n## The `internal` package ##\n\nWhile most users will rely solely on the features exposed by the `OSM` object,\nfiner-grained control of the output of the process—say, if one does not need\nrelations, for example—is available through the `vectorpipe.internal`\npackage.\n\nThere is a significant caveat here: there are two schemas that are\nfound in the system when working with imported OSM dataframes.  The difference\nis in the type of a sub-field of the `members` list.  This can cause errors of\nthe form\n```\njava.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Byte\n```\nwhen using the `internal` package methods.\n\nThese type problems can be fixed by calling\n`vectorpipe.functions.osm.ensureCompressedMembers` on the input OSM data frame\nbefore passing to any relation-generating functions, such as\n`reconstructRelationGeometries`.  Top-level functions in the `OSM` object\nhandle this conversion for you.  Note that this only affects the data frames\ncarrying the initially imported OSM data.\n\n## Local Development ##\n\nIf you are intending to contribute to VectorPipe, you may need to work with a\ndevelopment version.  If that is the case, instead of loading from Bintray,\nyou will need to build a fat jar using\n```bash\n./sbt assembly\n```\nand following that,\n```bash\nspark-shell --jars target/scala_2.11/vectorpipe.jar\n```\n\n### IntelliJ IDEA\n\nWhen developing with IntelliJ IDEA, the sbt plugin will see Spark dependencies\nas provided, which will prevent them from being indexed properly, resulting in\nerrors / warnings within the IDE. To fix this, create `idea.sbt` at the root of\nthe project:\n\n```scala\nimport Dependencies._\n\nlazy val mainRunner = project.in(file(\"mainRunner\")).dependsOn(RootProject(file(\".\"))).settings(\n  libraryDependencies ++= Seq(\n    sparkSql % Compile\n  )\n)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeotrellis%2Fvectorpipe","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgeotrellis%2Fvectorpipe","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgeotrellis%2Fvectorpipe/lists"}