https://github.com/anas436/analyzing-real-world-data-set-in-ibm-db2-with-sqlmagic-and-python
https://github.com/anas436/analyzing-real-world-data-set-in-ibm-db2-with-sqlmagic-and-python
ipython-sql jupyter-notebook matplotlib pandas python3 seaborn sql sql-magic sqlalchemy
Last synced: 24 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/anas436/analyzing-real-world-data-set-in-ibm-db2-with-sqlmagic-and-python
- Owner: Anas436
- Created: 2022-05-31T14:46:06.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2022-05-31T14:50:01.000Z (about 4 years ago)
- Last Synced: 2025-03-27T10:48:09.008Z (about 1 year ago)
- Topics: ipython-sql, jupyter-notebook, matplotlib, pandas, python3, seaborn, sql, sql-magic, sqlalchemy
- Language: Jupyter Notebook
- Homepage:
- Size: 21.5 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Analyzing-Real-World-Data-Set-in-IBM-Db2-with-SQLMagic-and-Python
The city of Chicago released a dataset of socioeconomic data to the Chicago City Portal. This dataset contains a selection of six socioeconomic indicators of public health significance and a “hardship index,” for each Chicago community area, for the years 2008 – 2012.
Scores on the hardship index can range from 1 to 100, with a higher index number representing a greater level of hardship.
A detailed description of the dataset can be found on the city of Chicago's website, but to summarize, the dataset has the following variables:
Community Area Number (ca): Used to uniquely identify each row of the dataset
Community Area Name (community_area_name): The name of the region in the city of Chicago
Percent of Housing Crowded (percent_of_housing_crowded): Percent of occupied housing units with more than one person per room
Percent Households Below Poverty (percent_households_below_poverty): Percent of households living below the federal poverty line
Percent Aged 16+ Unemployed (percent_aged_16_unemployed): Percent of persons over the age of 16 years that are unemployed
Percent Aged 25+ without High School Diploma (percent_aged_25_without_high_school_diploma): Percent of persons over the age of 25 years without a high school education
Percent Aged Under 18 or Over 64:Percent of population under 18 or over 64 years of age (percent_aged_under_18_or_over_64): (ie. dependents)
Per Capita Income (per_capita_income_): Community Area per capita income is estimated as the sum of tract-level aggragate incomes divided by the total population
Hardship Index (hardship_index): Score that incorporates each of the six selected socioeconomic indicators
In this Lab, we'll take a look at the variables in the socioeconomic indicators dataset and do some basic analysis with Python.
Connect to the database Let us first load the SQL extension and establish a connection with the database
The following required modules are pre-installed in the Skills Network Labs environment. However if you run this notebook commands in a different Jupyter environment (e.g. Watson Studio or Ananconda) you may need to install these libraries by removing the # sign before !pip in the code cell below.