https://github.com/nikhilba/aerial-imagery
Data Science Research Project: Map poverty using satellite images.
https://github.com/nikhilba/aerial-imagery
carnegie-mellon-university data-science deep-learning ipynb neural-network satellite-images vgg16
Last synced: 8 months ago
JSON representation
Data Science Research Project: Map poverty using satellite images.
- Host: GitHub
- URL: https://github.com/nikhilba/aerial-imagery
- Owner: nikhilba
- License: mit
- Created: 2017-02-09T22:04:19.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2020-08-14T22:18:24.000Z (almost 6 years ago)
- Last Synced: 2025-02-01T10:51:10.956Z (over 1 year ago)
- Topics: carnegie-mellon-university, data-science, deep-learning, ipynb, neural-network, satellite-images, vgg16
- Language: Jupyter Notebook
- Homepage:
- Size: 208 MB
- Stars: 11
- Watchers: 3
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Aerial-Imagery
Authors: [Nikhil Binod Agarwal](https://github.com/nikhilba) and [Tina Hong Bu](https://github.com/TinaHongBu), under the guidance of [Prof. Zico Kolter](http://zicokolter.com/), Carnegie Mellon University.
# Status
This is an ongoing project.
Code is contained in the Code/ directory (iPython Notebooks) and Data (except Satellite Images) is contained in the Dataset/ directory.
1. Downloading Satellite Images [COMPLETED] - Code/ImageDownload.ipynb
2. Mapping Income info with Satellite Image [COMPLETED] - Code/Read_Census_Info_BG.ipynb
3. Classifying Images [In Progress] - Code/Aerial_Imagery_Deep_Learning.ipynb
# Required Package
Following packages have been used to run the code in this repo:
- geopandas
- shapely
- fiona
- matplotlib
- seaborn
- pandas
- numpy
- bcolz
Deep Learning Packages:
- Keras
- Theano (backend)
- Tensorflow (backend)
# Problem Statement
We are interested in applying the deep learning techniques to the satellite images of a location (in our case, we look at Pittsburgh city) and related census information (such as income, age, sex, race, etc.). We attempt to investigate the possibility of predicting poverty in a region.
Gathering census information is expensive and resource intensive. Our analysis would be a cost effective alternative solution using a fairly high-resolution aerial imagery to obtain an approximation of the census information at a significatly reduced cost. This can be achieved by building a classifier that can predict income level of a regoin.
# Approach
Our aim is to implement an end-to-end data science pipeine from scratch. Thus, we create a unique dataset of Satellite Images and associate every image with its corresponding census information (in this case income).
To be able to predict poverty using aerial images, we use deep-learning technique to generate image features and then train a classifier to predict the income level of the image. For ML model, we use neural network, specifically convolutional neural network. Tuning and training a new neural network takes a significant amount of time and effort, we thus use a [VGG-16 network model](https://gist.github.com/baraldilorenzo/07d7802847aaad0a35d3) with weights pre-trained on ImageNet, to generate the image features.
Such pre-trained very deep convolutional neural networks have been known to generalize well and achieve state-of-the-art results. Tweaking the last few layers of this model, we would measure its performance.
# Data Collection & Processing
We create a unique dataset from scratch using the open source infomation of the state Pennsylvania.
1. To generate the desired predictor training data, we first download the satellite images of Pittsburgh using [Google Static Maps Api](https://developers.google.com/maps/documentation/static-maps/)
2. The Census data by Block group is downloaded from the [US Census Bureau](https://www.census.gov/geo/maps-data/data/tiger-data.html).
3. We then map each satellite image with its corresponding income level info.
4. Income level is transformed into bins of size 40 for classification.
5. Each image is further split into 9 parts and mapped with corresponding income bin.
# Modeling
We use a VGG-16 model with weights pre-trained on ImageNet, but we drop the last two dense and dropout layers. We then generate useful image features from the truncated VGG-16 Model. In the end, we classify the images by income level.