{"id":23061818,"url":"https://github.com/rrwen/msa-thesis","last_synced_at":"2026-04-26T23:31:53.116Z","repository":{"id":91307887,"uuid":"66692052","full_name":"rrwen/msa-thesis","owner":"rrwen","description":"Thesis titled \"Geospatial Semantic Pattern Recognition in Volunteered Geographic Data Using the Random forest Algorithm\" for the degree of Masters of Spatial Analysis at Ryerson University in 2016","archived":false,"fork":false,"pushed_at":"2017-04-25T08:31:51.000Z","size":5990,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-03T07:19:51.668Z","etag":null,"topics":["detection","forest","geographic","geography","geolocation","geospatial","gis","importance","learning","machine","openstreetmap","outlier","pattern","random","recognition","semantic","tag","thesis","variable","volunteer"],"latest_commit_sha":null,"homepage":"https://rrwen.github.io/msa-thesis","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rrwen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-08-27T02:44:54.000Z","updated_at":"2022-01-06T07:44:54.000Z","dependencies_parsed_at":"2023-09-25T01:46:52.876Z","dependency_job_id":null,"html_url":"https://github.com/rrwen/msa-thesis","commit_stats":{"total_commits":23,"total_committers":4,"mean_commits":5.75,"dds":"0.30434782608695654","last_synced_commit":"4b72d5571b91ef1ca5266c8e151fdc5e387d57ac"},"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/rrwen/msa-thesis","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rrwen%2Fmsa-thesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rrwen%2Fmsa-thesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rrwen%2Fmsa-thesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rrwen%2Fmsa-thesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rrwen","download_url":"https://codeload.github.com/rrwen/msa-thesis/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rrwen%2Fmsa-thesis/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32317163,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T23:26:28.701Z","status":"ssl_error","status_checked_at":"2026-04-26T23:26:25.802Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["detection","forest","geographic","geography","geolocation","geospatial","gis","importance","learning","machine","openstreetmap","outlier","pattern","random","recognition","semantic","tag","thesis","variable","volunteer"],"created_at":"2024-12-16T03:18:34.166Z","updated_at":"2026-04-26T23:31:53.091Z","avatar_url":"https://github.com/rrwen.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Geospatial Semantic Pattern Recognition in Volunteered Geographic Data Using the Random forest Algorithm\n\nRichard Wen  \nrwen@ryerson.ca  \nMasters of Spatial Analysis, Ryerson University, 2016  \nThesis Defended on April 27, 2016  \nSupervised by Dr. Claus Rinner\n* [PDF](https://github.com/rrwen/msa-thesis/blob/paper/thesis.pdf)\n* [Defense Slides](https://rrwen.github.io/msa-thesis)\n  \n## Abstract\nThe ubiquitous availability of location technologies has enabled large quantities of Volunteered Geographic Data (VGD) to be produced by users worldwide. VGD has been a cost effective and scalable solution to obtaining unique and freely available geospatial data. However, VGD suffers from reliability issues as user behaviour is often variable. Large quantities make manual assessments of the user generated data inefficient, expensive, and impractical. This research utilized a random forest algorithm based on geospatial semantic variables in order to aid the improvement and understanding of multi-class VGD without ground-truth reference data. An automated Python script of a random forest based procedure was developed. A demonstration of the automated script on OpenStreetMap (OSM) data with user generated tags in Toronto, Ontario, was effective in recognizing patterns in the OSM data with predictive performances of ~0.71 (where 0 is the worse, and 1 is the best) based on a class weighted metric, and the ability to reveal variable influences and outliers.\n  \n## Contents\n**[Code](https://github.com/rrwen/msa-thesis#code)**  \n* [Dependencies](https://github.com/rrwen/msa-thesis#dependencies)  \n* [Windows Installation](https://github.com/rrwen/msa-thesis#windows-installation)  \n* [Linux Installation](https://github.com/rrwen/msa-thesis#linux-installation)  \n* [Run](https://github.com/rrwen/msa-thesis#run)  \n  \n**[Information](https://github.com/rrwen/msa-thesis#information)**  \n* [Defense](https://github.com/rrwen/msa-thesis#defense)  \n* [Hardware](https://github.com/rrwen/msa-thesis#hardware)\n  \n## Code\nThe code was written in [Python 3.5](https://www.python.org/about/) and has been tested for the [Mapzen Toronto data](https://mapzen.com/data/metro-extracts/metro/toronto_canada/) for Windows and Linux operating systems. The code is described in Section 4 of the [PDF](https://github.com/rrwen/msa-thesis/blob/paper/thesis.pdf), which used a tree-optimized random forest model to learn geospatial patterns for the prediction and outlier detection of known spatial object classes (Figure 1).  \n![Figure 1](https://github.com/rrwen/msa-thesis/blob/master/methods.png)  \nFigure 1. Flowchart of code process  \n\n### Dependencies\n* [Anaconda Python 3.5](https://www.continuum.io/downloads/)  \n* [GDAL](http://www.gdal.org/)  \n* [Fiona](http://toblerity.org/fiona/manual.html)  \n* [pyproj](https://github.com/jswhit/pyproj)  \n* [Shapely](https://github.com/Toblerity/Shapely)  \n* [geopandas](http://geopandas.org/)  \n* [joblib](https://pythonhosted.org/joblib/)  \n* [seaborn](https://stanford.edu/~mwaskom/software/seaborn/)  \n* [treeinterpreter](https://github.com/andosa/treeinterpreter)  \n* [tqdm](https://github.com/noamraph/tqdm)  \n* [rtree](http://toblerity.org/rtree/)  \n  \n### Windows Installation\n1. Install [Anaconda Python 3.5](https://www.continuum.io/downloads#windows) for windows\n2. Download wheel files: [GDAL](http://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal), [Fiona](http://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona), [pyproj](http://www.lfd.uci.edu/~gohlke/pythonlibs/#pyproj), and [shapely](http://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely) for Python 3.5 (cp35)\n3. Uninstall existing OSGeo4W, GDAL, Fiona, pyproj, or shapely libraries\n4. Navigate to downloaded wheel files using the console `cd path/to/downloaded_wheels`\n5. Install the wheel (.whl) files and libraries using `pip install`  \n  \n*64-bit Example ([Same wheel files used in thesis](https://github.com/rrwen/msa-thesis/releases/download/v1.0-reproduce/win-wheels-64.zip))*\n```shell\ncd path/to/downloaded_wheels\npip install GDAL-2.0.3-cp35-cp35m-win_amd64.whl\npip install Fiona-1.7.0-cp35-cp35m-win_amd64.whl\npip install pyproj-1.9.5.1-cp35-cp35m-win_amd64.whl\npip install Shapely-1.5.16-cp35-cp35m-win_amd64.whl\npip install geopandas\npip install joblib\npip install seaborn\npip install treeinterpreter\npip install tqdm\nconda install -c ioos rtree\n```  \n\n*32-bit Example*\n```shell\ncd path/to/downloaded_wheels\npip install GDAL-2.0.3-cp35-cp35m-win32.whl\npip install Fiona-1.7.0-cp35-cp35m-win32.whl\npip install pyproj-1.9.5.1-cp35-cp35m-win32.whl\npip install Shapely-1.5.16-cp35-cp35m-win32.whl\npip install geopandas\npip install joblib\npip install seaborn\npip install treeinterpreter\npip install tqdm\nconda install -c ioos rtree\n```  \n\nThanks to [Geoff Boeing](http://geoffboeing.com/about/) for the [Using geopandas on windows](http://geoffboeing.com/2014/09/using-geopandas-windows/) blog post and [Christoph Gohlke](http://www.lfd.uci.edu/~gohlke/) for the [wheel files](http://www.lfd.uci.edu/~gohlke/pythonlibs/).\n  \n### Linux Installation\n1. Install [Anaconda Python 3.5](https://www.continuum.io/downloads#linux) for linux  \n2. Install libraries using `pip install` and `conda install`  \n```shell\npip install treeinterpreter\npip install tqdm\nconda install -c conda-forge geopandas\nconda install joblib\nconda install seaborn\nconda install -c ioos rtree\n``` \n\n### Run\n1. Download [this repository](https://github.com/rrwen/msa-thesis/archive/master.zip)  \n2. Unzip the file and navigate to the code folder `cd path/to/msa-thesis-master/py`  \n3. Execute the code using `python thesis.py`\n```shell\ncd path/to/msa-thesis-master/py\npython thesis.py config.txt path/to/output_folder\n```  \nThe config file can be used to apply and alter the methods to other datasets.  \nPlease see Section 4.1 in the [PDF](https://github.com/rrwen/msa-thesis/blob/paper/thesis.pdf) for more details.  \n  \n*Note: The unedited config.txt file contains the settings used to obtain results for the most recent [Mapzen Toronto data](https://mapzen.com/data/metro-extracts/metro/toronto_canada/). The data used in the thesis is provided in the  [reproduce release](https://github.com/rrwen/msa-thesis/releases/tag/v1.0-reproduce) which contains instructions to reproduce the thesis results.*\n\n# Information\n\n### Defense\n* **Date**: April 27, 2016\n* **Time**: 2:00 p.m. to 4:00 p.m.\n* **Location**: Jorgenson Hall 730, Ryerson University, Toronto, ON\n* **Chair**: Dr. Lu Wang\n* **Examiner 1**: Dr. Eric Vaz\n* **Examiner 2**: Dr. Tony Hernandez\n* **Result**: Pass with minor revisions  \n\n### Hardware\nPersonal machine:\n* Windows 8.1 64-bit  \n* i7-6700k 4.0 GHz Quad-Core  \n* 16 GB DDR4 2133 RAM  \n* 256 GB SSD + 512 GB SSD (Read: Up to 540 MB/sec, Write: Up to 520 MB/sec)  \n* **Runtime**: ~30-45 minutes  \n\nVirtual machine generously provided by Ryerson [RC4](http://rc4.ryerson.ca/):\n* Debian Linux  \n* 6-Core CPU  \n* 6 GB RAM  \n* 66 GB Storage  \n* **Runtime**: ~50-60 minutes  \n  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frrwen%2Fmsa-thesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frrwen%2Fmsa-thesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frrwen%2Fmsa-thesis/lists"}