Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/bnchrch/opening-hours-data-miner

⛏ Datamining Google places to determine opening hours of places based on location and type
https://github.com/bnchrch/opening-hours-data-miner

data-science data-visualization flask postgresql python scraper weka

Last synced: 14 days ago
JSON representation

⛏ Datamining Google places to determine opening hours of places based on location and type

Host: GitHub
URL: https://github.com/bnchrch/opening-hours-data-miner
Owner: bnchrch
Created: 2015-02-24T01:23:39.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2015-05-05T21:43:18.000Z (over 9 years ago)
Last Synced: 2024-11-11T19:17:09.168Z (2 months ago)
Topics: data-science, data-visualization, flask, postgresql, python, scraper, weka
Language: Python
Homepage:
Size: 2.38 MB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# SENG 474: Data Mining Project Spring 2015

### Objective
To determine the operating hours of buisnesses based on their geographic location and buisness type

### Whats in this repo?
* An exploratory scraper for the google places api written with node
* A final implementation of the scrapper written in threaded python

### What does the scraper do exactly?
* Queries the google places api based on location and either radius or buisness type
** currently it's set to buisness types which are sources from our pre existing database (you can populate it by uncommenting our radius search query)
* then it checks each returned place and will query for further details only if we do not already have the place in our database and the place has opening and closing hours attached
* when it queries for a listings details it will save the results to our postgresql datastore
* if a page token is attached the scraper will query the next page or else it will query a new random location based on the values in location.py

### Technologies Used
* Python 2
* SQL Alchemy
* PostgreSQL
* Weka
* Digital Ocean

### Discoveries
Over 18 days of running we were able to gather 10,657,610 unique data points out of 923,409 locations when expanding on buisness type and day.

![alt text](https://github.com/bechurch/474project/blob/master/map.png "only 1% of our findings make a population density!")

Using the J48 algorithm in weka we were able to get an accuracy of 80.713%!

### Future Plans
* rewrite the scraper to better handle threading
* save what queries our places data comes from
* make our location selection more intelligent by either storing where we've searched or use a spidering algorithm
* move our findings into a web app