https://github.com/karenwky/visualization_hong_kong_property
web scraping and EDA(Exploratory Data Analysis) project
https://github.com/karenwky/visualization_hong_kong_property
beautifulsoup4 matplotlib pandas regex scraping-websites seaborn
Last synced: 8 months ago
JSON representation
web scraping and EDA(Exploratory Data Analysis) project
- Host: GitHub
- URL: https://github.com/karenwky/visualization_hong_kong_property
- Owner: karenwky
- License: mit
- Created: 2019-09-08T07:38:08.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2024-09-05T13:30:30.000Z (about 1 year ago)
- Last Synced: 2024-09-18T06:43:13.499Z (about 1 year ago)
- Topics: beautifulsoup4, matplotlib, pandas, regex, scraping-websites, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 55.5 MB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Visualization: Hong Kong Property
With data scraped from Hong Kong Property [website](https://en.hkp.com.hk/find-property/), a EDA(Exploratory Data Analysis) project is conducted.## Data Source
1. [Transaction History](https://app2.hkp.com.hk/utx/default.jsp?lang=en)
Having transaction data within 3 years, explore the distribution of property transaction in Hong Kong.2. [Find Property](https://en.hkp.com.hk/find-property/#list)
Scraping property details such as number of bedrooms and selling price, discover the correlation between various features.## Findings

Two clusters can be simply divided according to the data. For the purple districts, only a few number of estates have relatively high number of transactions in the district. On the contrast, for the cyan districts, relatively high number of estates have high number of transactions in the district. The purple districts are having a more skewed distribution than the cyan districts.

Most of the number of transactions are within the range 200. N.T. East(grey box) has the highest upper extreme and highest median for number of transaction. Kowloon Central(purple box) has the highest number of outliers for number of transaction.

From the *total* number of transaction data, slightly difference is shown compared with distribution data by estate. Although N.T. East has the highest upper extreme and highest median for number of transaction(refer to the boxplot above), it is only the third highest district in total number of transaction. But with a lot of outliers, Kowloon Central is the district with highest total number of transaction.
From the scraped data, the selling price is mostly within HKD 50 million. The flats mainly have 1-4 bedrooms, and the efficiency ratio is mostly between the range 60 to 90 percent.## Detailed Presentation

* Check out complete workflow with [Keynote](./slides.key).
* Check out complete code with [Jupyter Notebook](https://github.com/yyzz1010/Visualization_Hong_Kong_Property/tree/master/code).## Skills Acquired
* Pandas
* Beautiful Soup
* Regex text cleaning
* Seaborn
* Matplotlib