An open API service indexing awesome lists of open source software.

https://github.com/karenwky/visualization_hong_kong_property

web scraping and EDA(Exploratory Data Analysis) project
https://github.com/karenwky/visualization_hong_kong_property

beautifulsoup4 matplotlib pandas regex scraping-websites seaborn

Last synced: 8 months ago
JSON representation

web scraping and EDA(Exploratory Data Analysis) project

Awesome Lists containing this project

README

          

# Visualization: Hong Kong Property
With data scraped from Hong Kong Property [website](https://en.hkp.com.hk/find-property/), a EDA(Exploratory Data Analysis) project is conducted.

## Data Source
1. [Transaction History](https://app2.hkp.com.hk/utx/default.jsp?lang=en)

Having transaction data within 3 years, explore the distribution of property transaction in Hong Kong.

2. [Find Property](https://en.hkp.com.hk/find-property/#list)

Scraping property details such as number of bedrooms and selling price, discover the correlation between various features.

## Findings
![Property Transactions in HK by District](/images/by_district.png)

Two clusters can be simply divided according to the data. For the purple districts, only a few number of estates have relatively high number of transactions in the district. On the contrast, for the cyan districts, relatively high number of estates have high number of transactions in the district. The purple districts are having a more skewed distribution than the cyan districts.





![Box Plot](/images/box_plot.png)

Most of the number of transactions are within the range 200. N.T. East(grey box) has the highest upper extreme and highest median for number of transaction. Kowloon Central(purple box) has the highest number of outliers for number of transaction.





![Bar Chart](/images/bar_chart.png)

From the *total* number of transaction data, slightly difference is shown compared with distribution data by estate. Although N.T. East has the highest upper extreme and highest median for number of transaction(refer to the boxplot above), it is only the third highest district in total number of transaction. But with a lot of outliers, Kowloon Central is the district with highest total number of transaction.





3D Scatter Plot

From the scraped data, the selling price is mostly within HKD 50 million. The flats mainly have 1-4 bedrooms, and the efficiency ratio is mostly between the range 60 to 90 percent.

## Detailed Presentation
![Slides](/images/slides.gif)
* Check out complete workflow with [Keynote](./slides.key).
* Check out complete code with [Jupyter Notebook](https://github.com/yyzz1010/Visualization_Hong_Kong_Property/tree/master/code).

## Skills Acquired
* Pandas
* Beautiful Soup
* Regex text cleaning
* Seaborn
* Matplotlib