{"id":13551109,"url":"https://github.com/donomii/wikipedia2geojson","last_synced_at":"2025-08-31T07:31:14.876Z","repository":{"id":95813250,"uuid":"67833008","full_name":"donomii/wikipedia2geojson","owner":"donomii","description":"Extracts geodata from a wikipedia dump","archived":false,"fork":false,"pushed_at":"2024-05-15T13:24:48.000Z","size":38,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-01T14:47:57.277Z","etag":null,"topics":["conversion","converter","geodata","geojson","geotagged-wikipedia-articles","geotagging","json","mapping","wikipedia","wikipedia-dump","wikipedia-scraper"],"latest_commit_sha":null,"homepage":"https://donomii.github.io/wikipedia2geojson","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/donomii.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-09-09T21:11:22.000Z","updated_at":"2024-05-15T13:24:51.000Z","dependencies_parsed_at":null,"dependency_job_id":"597e35cd-370e-4827-803b-eac6b73cfc42","html_url":"https://github.com/donomii/wikipedia2geojson","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/donomii/wikipedia2geojson","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donomii%2Fwikipedia2geojson","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donomii%2Fwikipedia2geojson/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donomii%2Fwikipedia2geojson/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donomii%2Fwikipedia2geojson/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/donomii","download_url":"https://codeload.github.com/donomii/wikipedia2geojson/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/donomii%2Fwikipedia2geojson/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272953914,"owners_count":25021133,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["conversion","converter","geodata","geojson","geotagged-wikipedia-articles","geotagging","json","mapping","wikipedia","wikipedia-dump","wikipedia-scraper"],"created_at":"2024-08-01T12:01:42.523Z","updated_at":"2025-08-31T07:31:14.131Z","avatar_url":"https://github.com/donomii.png","language":"Go","funding_links":[],"categories":["Go","json"],"sub_categories":[],"readme":"This project has moved to my [geotools project](https://github.com/donomii/geotools/tree/master/wikipedia2geojson)\n\n# Extract geojson coordinates from wikipedia files\n\nWikipedia2geojson reads wikipedia files and prints out geo locations.  \n\nIt can read compressed files, and compressed streams.  Works on Linux, MacOS, and MS Windows.\n\n# Example\n\n        wikipedia2geojson.exe file.xml.bz2\n\nReads from file.xml.bz2, automatically uncompressing bz2 format\n\n    [\n    { \"type\": \"Feature\", \"geometry\": { \"type\": \"Point\", \"coordinates\": [ 2, 28 ] }, \"properties\": { \"name\": \"\"Algeria\"\" } }\n    { \"type\": \"Feature\", \"geometry\": { \"type\": \"Point\", \"coordinates\": [ 30, 42 ] }, \"properties\": { \"name\": \"\"Andorra\"\" } }\n    { \"type\": \"Feature\", \"geometry\": { \"type\": \"Point\", \"coordinates\": [ -150, 64 ] }, \"properties\": { \"name\": \"\"Alaska\"\" } }\n    { \"type\": \"Feature\", \"geometry\": { \"type\": \"Point\", \"coordinates\": [ 19, 13 ] }, \"properties\": { \"name\": \"\"Apollo 11\"\" } }\n    \nEach location is on its own line, so you can pipe this stream into grep and other command line programs.  Add the --strict flag if you want completely correct geojson format.\n\n# Streaming\n\nBecause w2g can read compressed streams, you can process network files on the fly.  You don't need to download them completely first.\n\n    wget -q -O - http://someserver.com/enwiki-pages-articles2.xml.bz2 | wikipedia2geojson --compression=bz2 -\n\t\ne.g. from the wikipedia download site (don't do this, it's better to download the file once and use it)\n\n\t\twget --no-check-certificate -q -O - https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p10p30302.bz2 | ./wikipedia2geojson --compression=bz2 --strict -\n\n# Installation\n\n        go get -u github.com/donomii/wikipedia2geojson\n\t\tgo install github.com/donomii/wikipedia2geojson\n      \n# More examples\n\n\n        wikipedia2geojson.exe file.xml\n\n                Read from file.xml\n\n\n        wikipedia2geojson.exe file.xml.bz2\n\n                Read from file.xml.bz2, automatically uncompressing bz2 format\n\n\n        wikipedia2geojson.exe file.xml.gz\n\n                Read from file.xml.bz2, automatically uncompressing gz format\n\n\n        wikipedia2geojson.exe --compression=bz2 file\n\n                Read from file, force uncompressing bz2 format\n\n\n        wikipedia2geojson.exe --compression=gz file\n\n                Read from file, force uncompressing gz format\n\n\n        wikipedia2geojson.exe -\n\n                Read from stdin.\n\n\n        wikipedia2geojson.exe --compression=bz2 -\n\n                Read from stdin.  Stdin is in bzip2 format\n\n\n        wikipedia2geojson.exe --compression=gz -\n\n                Read from stdin.  Stdin is in gz format\n\n# Known bugs\n\nYou can't stream straight from the network on MS Windows, because Windows fiddles with the data as it goes through the pipe, and most download programs don't know how to stop that.\n\nSo this won't work\n\n    wget -q -O - http://someserver.com/enwiki-pages-articles2.xml.bz2 | wikipedia2geojson.exe --compression=bz2 -\n\n\n\nW2g does not print out fully compliant geojson.  Instead of printing an array of points, it just prints the points. To change the output into fully compliant geojson, add the --strict flag to the command line.\n\n# Bonus\n\nA perl one liner to unpack the wikipedia geodata files in sql format\n\n    type enwiki-20171103-geo_tags.sql | perl -pe \"s/\\),\\(/\\r\\n/g\" | perl -ne \"@c=split/,/;if($c[8]ne'NULL'){print '{ \\\"type\\\": \\\"Feature\\\", \\\"geometry\\\": { \\\"type\\\": \\\"Point\\\", \\\"coordinates\\\": [ '.$c[4].', '.$c[5].' ] }, \\\"properties\\\": { \\\"name\\\": '.$c[8].' } };'.\\\"\\n\\\";}\"\n\n    cat enwiki-20190501-geo_tags.sql | perl -pe \"s/\\),\\(/\\n/g\" | perl -ne '@c=split/,/;if($c[8]ne\"NULL\"){print \"{ \\\"type\\\": \\\"Feature\\\", \\\"geometry\\\": { \\\"type\\\": \\\"Point\\\", \\\"coordinates\\\": [ \".$c[4].\", \".$c[5].\" ] }, \\\"properties\\\": { \\\"name\\\": \\\".$c[8].\\\" } }\".\"\\n\";}'\n\nWikipedia's geodata extraction appears to have trouble identifying points, so wikipedia2geojson will be useful for a bit longer yet.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdonomii%2Fwikipedia2geojson","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdonomii%2Fwikipedia2geojson","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdonomii%2Fwikipedia2geojson/lists"}