{"id":16162480,"url":"https://github.com/sunsided/osm-berlin","last_synced_at":"2026-05-02T02:40:03.748Z","repository":{"id":141992950,"uuid":"98815238","full_name":"sunsided/osm-berlin","owner":"sunsided","description":"Data wrangling on OpenStreetMap data of Berlin, Germany","archived":false,"fork":false,"pushed_at":"2018-07-01T16:39:26.000Z","size":699,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-07T03:48:09.761Z","etag":null,"topics":["berlin","data-analyst-nanodegree","data-wrangling","git-lfs","openstreetmap","udacity-nanodegree"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sunsided.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-07-30T17:56:42.000Z","updated_at":"2022-10-02T20:28:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"08812a86-118e-472c-b239-19d47591e386","html_url":"https://github.com/sunsided/osm-berlin","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sunsided/osm-berlin","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fosm-berlin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fosm-berlin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fosm-berlin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fosm-berlin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sunsided","download_url":"https://codeload.github.com/sunsided/osm-berlin/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Fosm-berlin/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32521113,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T01:12:54.858Z","status":"online","status_checked_at":"2026-05-02T02:00:05.923Z","response_time":132,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["berlin","data-analyst-nanodegree","data-wrangling","git-lfs","openstreetmap","udacity-nanodegree"],"created_at":"2024-10-10T02:30:17.007Z","updated_at":"2026-05-02T02:40:03.701Z","avatar_url":"https://github.com/sunsided.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenStreetMaps Data Wrangling\n\n**GIT LFS:** This repo tracks files using [GIT LFS](https://git-lfs.github.com/). Install GIT LFS first, then clone regularly.\n\nThis project deals with wrangling of OpenStreetMap data of Berlin, Germany.\nTwo regions were used:\n\n* A custom crop of the Berlin city area.\n    * Region: `52.3319824..52.6797125 N`, `13.0709838..13.7741088 E` ([OSM](http://www.openstreetmap.org/#map=11/52.5062/13.4222)).\n    * `101 MB` compressed, `1.4 GB` decompressed XML.\n* A smaller sample of the Mitte district of Berlin.\n    * Region: `52.52912..52.53794 N`, `13.39977..13.40550 E` ([OSM](http://www.openstreetmap.org/#map=17/52.53110/13.40201)).\n    * `148 KB` compressed, `1.9 MB` decompressed XML.\n\nThese are of particular interest to me, because Berlin is my hometown and Berlin Mitte\nis the district in which I grew up and went to school.\n\nIf needed, both regions can be downloaded again e.g. by querying the XAPI Compatibility Layer\nof the OSM Overpass API (see [here](https://wiki.openstreetmap.org/wiki/Overpass_API/XAPI_Compatibility_Layer)):\n\n```bash\nwget -O berlin.osm 'http://www.overpass-api.de/api/xapi?*[bbox=13.0709838,52.3319824,13.7741088,52.6797125][@meta][@timeout=3600]'\nwget -O berlin-mitte.osm 'http://www.overpass-api.de/api/xapi?*[bbox=13.39977,52.52912,13.40550,52.53794][@meta]'\n```\n\n## OSM XML tag survey\n\nBy running the tag extraction script `find_tags.py` on the full Berlin city region the\nfollowing [OSM XML](http://wiki.openstreetmap.org/wiki/OSM_XML) tag paths (and the number of their occurrences) were determined:\n\n```\n         1 osm\n    935054 osm.way\n         1 osm.note\n         1 osm.meta\n   6007591 osm.node\n   7429706 osm.way.nd\n   2738476 osm.way.tag\n   3215139 osm.node.tag\n     14715 osm.relation\n     78089 osm.relation.tag\n    325599 osm.relation.member\n```\n\nwhere `osm ` is the root element.\nThe smaller Berlin Mitte region, in contrast, contains the following counts:\n\n```\n         1 osm\n       864 osm.way\n         1 osm.note\n         1 osm.meta\n      7580 osm.node\n      8788 osm.way.nd\n      2854 osm.way.tag\n      5518 osm.node.tag\n        40 osm.relation\n       332 osm.relation.tag\n      2539 osm.relation.member\n```\n\nA description of the base elements `node`, `way`, `relation` and `tag`\ncan be found [here](http://wiki.openstreetmap.org/wiki/Elements).\n\nThe `find_tag_keys.py` script counts all `tag` keys. The result looks e.g. like\nthis: \n\n```\n         2 abandoned:place\n        46 access\n      1004 addr:city\n       966 addr:country\n         2 addr:flats\n         2 addr:housename\n      1030 addr:housenumber\n         4 addr:inclusion\n      1010 addr:postcode\n      1027 addr:street\n       970 addr:suburb\n         2 advertising\n        12 alt_name\n...\n         2 diet:vegan\n         4 diet:vegetarian\n         2 direction\n         2 dispensing\n         2 disused:shop\n         2 drink:wine\n         4 drinking_water\n...\n         2 toilets\n        28 toilets:wheelchair\n...\n       317 wheelchair\n         8 wheelchair:description\n        23 wikidata\n        19 wikipedia\n         2 workrules\n```\n\n## Auditing example: Street names\n\nStreet names can be collected into a file `street_names.txt` using\n\n```bash\npython collect_street_names.py --out street_names.txt\n```\n\nThis creates a file like\n\n```text\nAnklamer Straße\nArkonaplatz\nChoriner Straße\nFehrbelliner Straße\nFürstenberger Straße\nGranseer Straße\nGriebenowstraße\nRheinsberger Straße\nRuppiner Straße\nSwinemünder Straße\nTorstraße\nVeteranenstraße\nWeinbergsweg\nWolliner Straße\nZehdenicker Straße\nZionskirchplatz\nZionskirchstraße\n```\n\nTo check name auditing, call\n\n```bash\npython test_street_names.py street_names.txt\n```\n\nThis runs a sequence of validation and correction steps and should print out a report like the following\n(depending on the set of street names):\n\n```\n  Skipped \"Allee der Kosmonauten/ Märkische Allee\": Not a street.\nCorrected \"Bergstrasse\" to \"Bergstraße\".\nCorrected \"Blankenfelder Str.\" to \"Blankenfelder Straße\".\n  Skipped \"Eichner Grenzweg/Ahrensfelder Chaussee\": Not a street.\nCorrected \"Ernst Zinna Weg\" to \"Ernst-Zinna-Weg\".\nCorrected \"Stadtrandstaße\" to \"Stadtrandstraße\".\nCorrected \"Strandpromedade\" to \"Strandpromenade\".\n  Skipped \"U-Bahnhof Alt-Tempelhof\": Not a street.\nCorrected \"Waterloo Ufer\" to \"Waterloo-Ufer\".\n```\n\n## XML Processing\n\nThis project uses `lxml.etree` rather than `xml.etree.cElementTree`\ndue to its additional schema validation capabilities.\nSince no official XSD document seems to be available for\nthe OSD format, a definition was taken from [here](https://gist.github.com/simon04/24ac9e9b1d0ce3c6655c1ffb2329ebc7).\nIt can be found at [`osm-extracts/osm.xsd`](osm-extracts/osm.xsd).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunsided%2Fosm-berlin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsunsided%2Fosm-berlin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunsided%2Fosm-berlin/lists"}