{"id":15058999,"url":"https://github.com/paulmach/osmzen","last_synced_at":"2025-04-10T05:11:51.894Z","repository":{"id":26449314,"uuid":"108784610","full_name":"paulmach/osmzen","owner":"paulmach","description":"OSM data into a kind/kind_detail normalization using tilezen configs","archived":false,"fork":false,"pushed_at":"2022-05-16T17:51:02.000Z","size":1170,"stargazers_count":20,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-10T05:11:44.477Z","etag":null,"topics":["golang","mapzen","openstreetmap","osm","tilezen"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/paulmach.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-30T00:53:05.000Z","updated_at":"2023-06-03T00:01:26.000Z","dependencies_parsed_at":"2022-08-09T09:40:27.950Z","dependency_job_id":null,"html_url":"https://github.com/paulmach/osmzen","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulmach%2Fosmzen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulmach%2Fosmzen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulmach%2Fosmzen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/paulmach%2Fosmzen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/paulmach","download_url":"https://codeload.github.com/paulmach/osmzen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248161276,"owners_count":21057555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["golang","mapzen","openstreetmap","osm","tilezen"],"created_at":"2024-09-24T22:35:13.823Z","updated_at":"2025-04-10T05:11:51.864Z","avatar_url":"https://github.com/paulmach.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# osmzen [![CI](https://github.com/paulmach/osmzen/workflows/CI/badge.svg)](https://github.com/paulmach/osmzen/actions?query=workflow%3ACI+event%3Apush) [![Go Report Card](https://goreportcard.com/badge/github.com/paulmach/osmzen)](https://goreportcard.com/report/github.com/paulmach/osmzen) [![Go Reference](https://pkg.go.dev/badge/github.com/paulmach/osmzen.svg)](https://pkg.go.dev/github.com/paulmach/osmzen)\n\nThis is a port of [tilezen/vector-datasource](https://github.com/tilezen/vector-datasource) developed by\n[Mapzen](https://mapzen.com/). It converts [Open Street Map](https://www.openstreetmap.org/) data\ndirectly into GeoJSON with properties that are understood by [Mapzen house\nstyles](https://mapzen.com/products/maps/). See the [tile server example](example) for a demo.\n\nA Postgres database is not required to evaluate the logic that is originally defined in a combination\nof SQL and Python. This allows for the quick mapping of any OSM element(s) to a `kind`/`kind_detail`\nnormalization. Such a normalization is non-trivial given the \"diversity\" of OSM tagging so projects\nlike tilezen/vector-datasource (and may others) are necessary.\n\nThe port currently implements almost all features applicable to evaluating zoom 14+ tile data.\nThese features include:\n\n-   all filter, min_zoom and output logic defined in the `yaml/*.yaml` files,\n-   all transforms that apply, implementation specific data transforms are skipped,\n-   the CSV matcher post processor to set the `scale_rank` and `sort_rank` properties,\n-   geometry clipping and label placement logic.\n\nA lot of post processors still need to be ported, but only a few of the missing ones apply\nto zooms 14+. Missing post processors include: landuse_kind intercuts, merging line strings,\nmerging building with building parts and any admin area matching used to get accurate country\ncodes for highways and other objects.\n\nIt would also be nice to port some of the integration tests as they would give confidence that\nthings are really working as expected. Right now there are just some unit tests and some\nhigh level sanity checks.\n\n#### Changes from the original tilezen/vector-datasource\n\nThe goal is for there to be no functional differences for zooms 14+. The YAML definition files are\nunchanged, there a just a few minor changes to the post processor filtering in `queries.yaml`.\n\nThe port is based off of [v1.8.0ish](https://github.com/tilezen/vector-datasource/releases/tag/v1.8.0)\nversion of the vector-datasource.\n\n## Usage\n\n1.  Load and compile the `queries.yaml`, `yaml/*.yaml` and `spreadsheets/*_rank/*.csv` files. This can\n    be done by loading the files directly using the implied directory structure:\n\n        config, err := osmzen.Load(\"config/queries.yaml\")\n\n    or if you want to use the \"official\" ported config files that are embedded into the binary\n    using the `embed` package from the standard library:\n\n        config, err := osmzen.LoadDefaultConfig()\n\n    If there are mistakes in the YAML the error will contain a lot of information to help debug:\n\n        if err, ok := errors.Cause(err).(*filter.CompileError); ok {\n        \tlog.Printf(\"error: %v\", err.Error())\n        \tlog.Printf(\"cause: %v\", err.Cause)\n        \tlog.Printf(\"yaml:\\n%s\", err.YAML()) // chunk of marshalled YAML with the issue\n        } else if err != nil {\n        \tlog.Printf(\"other err: %v\", err)\n        }\n\n2.  Process some OSM data:\n\n        data := osm.OSM{}\n        layers, err := config.Process(\n        \tdata,\n        \torb.Bound{Min: orb.Point{-180, -90}, Max: orb.Point{180, 90}},\n        \tzoom,\n        )\n\n        // layers is defined as `map[string]*geojson.FeatureCollection`\n\n    Layers can also be processed individually:\n\n        featureCollection, err := config.Layers[\"buildings\"].Process(\n        \tdata,\n        \torb.Bound{Min: orb.Point{-180, -90}, Max: orb.Point{180, 90}},\n        \tzoom,\n        )\n\n    The bound is necessary for clipping. Typically, set to the bound of the requested tile.\n\nThe result is a GeoJSON feature collection with `kind`, `kind_detail` etc. properties that\nare understood by [Mapzen house styles](https://mapzen.com/products/maps/).\n\n## Example\n\nA more complete example that loads a zoom 16 area from the OSM API and\nthe processes the tile (minus error checking):\n\n```go\npackage main\n\nimport (\n\t\"context\"\n\t\"encoding/json\"\n\t\"fmt\"\n\n\t\"github.com/paulmach/osmzen\"\n\n\t\"github.com/paulmach/orb/maptile\"\n\t\"github.com/paulmach/osm\"\n\t\"github.com/paulmach/osm/osmapi\"\n)\n\nfunc main() {\n\ttile := maptile.New(19613, 29310, 16)\n\n\t// load osmzen config\n\tconfig, _ := osmzen.LoadDefaultConfig()\n\n\t// get osm data for a tile from the offical api.\n\tbounds, _ := osm.NewBoundsFromTile(tile)\n\tdata, _ := osmapi.Map(context.Background(), bounds)\n\n\t// process the data\n\t// The tile coords will be used to exclude interesting nodes\n\t// and labels outside the tile.\n\tlayers, _ := config.Process(data, tile.Bound(), tile.Z)\n\n\t// pretty print the json\n\tpretty, _ := json.MarshalIndent(layers, \"\", \" \")\n\tfmt.Println(string(pretty))\n}\n```\n\n## Implementation details\n\nAt a high level [tilezen/vector-datasource](https://github.com/tilezen/vector-datasource) filters and\nprocesses its data using the following steps:\n\n1. find relevant elements for a layer using the SQL queries defined in `data/{layer_name}.jinja`,\n2. filter the elements using filter _conditions_ defined in `yaml/{layer_name}.yaml`,\n3. generate properties for each element using the matching filter's output _expressions_,\n4. apply _transforms_ to each element independently,\n5. apply _post processes_ to all the layers together.\n\nThe transforms and post processes that apply to each layer and zoom are defined in `queries.yaml`.\nFor a lot more details see the official tilezen/vector-datasource [project\noverview](https://github.com/tilezen/vector-datasource/blob/master/CONTRIBUTING.md).\n\nAs this package is a port of that code it follows the same steps, except for step 1 since the data\nis passed in directly.\n\n### Loading and compiling config\n\nDuring the loading of the YAML+CSV config files everything is compiled to make sure all the\nexpressions and function references are known. If there is a typo, or something new/unsupported, an\nerror will be returned. See above for how to get useful information from the error. The initial\ncompile step allows for the checking of config errors at startup. Also since the types are converted\nup front there is a nice performance boost of about 10x.\n\nThe filters and outputs defined in the `yaml/*.yaml` files are basically a set of statements that\nact like: \"if the element tags look like this, output these kind, kind_detail, etc. properties\".\n\nThe filters define a condition, yes/no matching, that evaluates into a boolean value. During the compile\nstep these are converted into concrete types that implement the `filter.Condition` interface. The\ninterface is defined as:\n\n    type filter.Condition interface {\n    \tEval(*filter.Context) bool\n    }\n\nThe output for each filter defines what properties should be assigned to the element's GeoJSON\nfeature. They output things such as booleans (is_tunnel), strings (kind), numbers (area) or nil to\nbe ignored. The interface is defined as:\n\n    type fitler.Expression interface {\n    \tEval(*filter.Context) interface{}\n    }\n\n    type filter.NumExpression interface {\n    \tfilter.Expression\n    \tEvalNum(*filter.Context) float64\n    }\n\nThe `filter.NumExpression` is also implemented by expressions that must be a number (e.g. area,\nbuilding height). Using it helps avoid a type indirection when we know we need numbers. For example\nthe `min` and `max` expressions.\n\nThe `filter.Context` is passed in at runtime and contains info about the element being evaluated\nlike the OSM tags and geometry. It also caches \"expensive\" things like the area and volume that can\nbe used by multiple filters.\n\n#### Transforms and post processes\n\nAfter elements for a layer are matched and GeoJSON features are created, a set of transforms is\napplied. The transforms edit the element properties based on some logic, sometimes requiring the\nset of relations the original OSM element is a member of.\n\nWhile loading the config the **transforms** are matched to functions of the form:\n\n    func(*filter.Context, *geojson.Feature)\n\nTransforms can only change a feature, they can't remove a feature if it's \"bad\" for any reason, like\ntoo small for the zoom. Transforms also don't know about other features, so they can't be used to\nremove duplicates or merge features, like parts of the same road. However, transforms can be used to\ndo things like fix one-way direction, abbreviate road names, etc.\n\nThe **post processes** are compiled to check the parameters and data files. They are mapped to an\nobject implementing the `postprocess.Function` interface defined as:\n\n    type postprocess.Function interface {\n    \tEval(*postprocess.Context, map[string]*geojson.FeatureCollection)\n    }\n\nThe function takes all the layers as input. Some examples of post processing are clipping to the\ntile bounds, setting sort_rank and scale_rank, removing duplicate features, removing small areas,\nmerging lines, etc.\n\n### Evaluating some data\n\nOnce everything is all setup we can start evaluating data against the filters and apply the\ntransforms and post processes. The input is OSM data, a bound, plus a zoom. The bound is used to\nclip geometry and check if a label should be included. The zoom is used to filter out\nthings that are \"too small\" as defined by the `min_zoom` output in the `yaml/*.yaml` files. To\ninclude everything, use a high zoom, such as 20.\n\nThe evaluation proceeds in the following steps:\n\n1. Convert OSM data to GeoJSON\n\n    The data is run through [osm/osmgeojson](https://github.com/paulmach/osm/tree/master/osmgeojson)\n    which is a port of the [osmtogeojson](https://github.com/tyrasd/osmtogeojson) node.js library.\n    This groups nodes into ways and ways into polygons. For example, we don't care about the 4 nodes\n    that define a building, we just want the building polygon.\n\n2. Run each OSM element GeoJSON feature through the filters\n\n    We find the first filter in each layer to match and then compute the filter's outputs. Note,\n    that an element can match in multiple layers, for example a building polygon and a POI.\n    The input and output are both GeoJSON, however, the input contains properties based on OSM tags,\n    but the output has properties from the filter like the `kind` and `kind_detail` etc.\n\n3. Apply the transforms\n\n    The new GeoJSON object is updated a bit. This can include reversing the geometry or simplifying\n    the name.\n\n4. Apply the post processes to all the layers.\n\nThe end result is a layer, or set of layers that match those produced by `tilezen`.\nNote that this whole process can be applied to a single element.\n\n### Benchmarks\n\nThe first two benchmarks evaluate a single element against ALL the filters and outputs\nin that layer. Normally you can stop after the first match and only evaluate that one output.\nThe third benchmark is more typical of normal usage and coverts data from a zoom 16 tile.\nThe last benchmark leaves out the osm data to GeoJSON step and just does the filtering\nand processing unique to this package.\n\n```\nBenchmarkBuildings-4      200000       9969 ns/op       1040 B/op       42 allocs/op\nBenchmarkPOIs-4            10000     171457 ns/op       6816 B/op      450 allocs/op\nBenchmarkFullTile-4          100   11292314 ns/op    3611916 B/op    26555 allocs/op\nBenchmarkProcessGeoJSON-4    200    8091129 ns/op    1978560 B/op    18319 allocs/op\n```\n\nNew benchmarks using v1.5.1 and go version 1.11.2\n\n```\nBenchmarkBuildings-4      300000       5525 ns/op        536 B/op       42 allocs/op\nBenchmarkPOIs-4            20000      80353 ns/op       8264 B/op      546 allocs/op\nBenchmarkFullTile-4          200    8736833 ns/op    2639975 B/op    22412 allocs/op\nBenchmarkProcessGeoJSON-4    200    6367984 ns/op    1198285 B/op    12874 allocs/op\n```\n\nThese benchmarks were run on a 2017 MacBook Pro with a 3.1 ghz processor and 8 gigs of ram.\nNo concurrency is used in this package.\n\n#### This library makes use of the following packages:\n\n-   [github.com/pkg/errors](https://github.com/pkg/errors) - for rich errors with stack traces\n-   [gopkg.in/yaml.v2](http://gopkg.in/yaml.v2) - YAML parsing\n-   [github.com/paulmach/orb](https://github.com/paulmach/orb) - geometry area, centroid, clipping, etc.\n-   [github.com/paulmach/osm](https://github.com/paulmach/osm)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulmach%2Fosmzen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpaulmach%2Fosmzen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpaulmach%2Fosmzen/lists"}