{"id":13857108,"url":"https://github.com/Casyfill/WikiGeoParser","last_synced_at":"2025-07-13T20:30:48.557Z","repository":{"id":33053127,"uuid":"36689408","full_name":"Casyfill/WikiGeoParser","owner":"Casyfill","description":"parses the whole wikipedia json dump and returns only the list of items with geocordinates `statement` within given rectangular","archived":false,"fork":false,"pushed_at":"2022-03-30T14:53:44.000Z","size":546,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-05-19T23:36:06.551Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Casyfill.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-06-01T21:16:44.000Z","updated_at":"2022-03-30T14:53:47.000Z","dependencies_parsed_at":"2022-08-17T18:35:21.635Z","dependency_job_id":null,"html_url":"https://github.com/Casyfill/WikiGeoParser","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Casyfill%2FWikiGeoParser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Casyfill%2FWikiGeoParser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Casyfill%2FWikiGeoParser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Casyfill%2FWikiGeoParser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Casyfill","download_url":"https://codeload.github.com/Casyfill/WikiGeoParser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":213988317,"owners_count":15666962,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-05T03:01:26.081Z","updated_at":"2024-08-05T03:02:47.315Z","avatar_url":"https://github.com/Casyfill.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"Wikipedia Dump Geoparser\n========================\n####Philipp Kats, May 2015\n\n## Describtion\nThis couple of scripts were written as part of [walkable streets](http://walkstreets.org/) project, lead by Andrew Karmatskiy.\nFirst script parses wikipedia Json dump line by line with the use of **ijson** module, and return strings of data for only those who has geostatement within defined rectangluar. Then, second grabs stats for those pages from [stats.grok.se](http://stats.grok.se/).\n\n##Dependencies\nScrip written in Python 2.7  with the use of [Ijson](https://pypi.python.org/pypi/ijson/)  for parsing big json files.\n\nother modules used:\n- requests\n- lxtml.html\n- csv\n\n##How it works\n1. First, download wikipedia dump as a json (i thing there is a way to read json from the archive directly)\n2. Filter json with **streamJson.py**\n3. Parse stats with **stats_parser.py**\n\nFor some reason, some of the articles were saved in dump several times. \n\nAlso, please take in mind that streamJson works so stats are given for one page - russian if there is such, english if there is no russian but english exists, and any other (first in dict) if there is no englis neither russian page.\n\n##Data source\n- [Dump source](http://www.wikidata.org/wiki/Wikidata:Database_download)\n- [more on data structure](http://www.mediawiki.org/wiki/Wikibase/DataModel/Primer#Ranks)\n- [page views stats](http://stats.grok.se/)\n\n* As you can notice, stats project allows to download raw stats data directly. However, I found myself stuck with this data encoding, so I find webscraping both easier and simpler.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCasyfill%2FWikiGeoParser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FCasyfill%2FWikiGeoParser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FCasyfill%2FWikiGeoParser/lists"}