{"id":50547345,"url":"https://github.com/royungar/sql_chicago_data_analysis_project","last_synced_at":"2026-06-04T00:01:34.608Z","repository":{"id":306439397,"uuid":"1025522200","full_name":"royungar/SQL_Chicago_Data_Analysis_Project","owner":"royungar","description":"SQL-based data analysis project using SQLite, pandas, and Jupyter SQL magic commands. Analyzes crime, school, and census data from Chicago to explore socioeconomic patterns using filtering, joins, aggregation, and subqueries.","archived":false,"fork":false,"pushed_at":"2025-07-28T00:40:26.000Z","size":958,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-28T02:30:49.989Z","etag":null,"topics":["aggregation","census-data","chicago","crime-data","data-analysis","data-engineering","education-data","ibm","jupyter-notebook","pandas","sql","sqlite","subqueries"],"latest_commit_sha":null,"homepage":"https://www.linkedin.com/in/royungar/","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/royungar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-24T11:35:22.000Z","updated_at":"2025-07-28T00:40:30.000Z","dependencies_parsed_at":"2025-07-28T02:30:51.007Z","dependency_job_id":null,"html_url":"https://github.com/royungar/SQL_Chicago_Data_Analysis_Project","commit_stats":null,"previous_names":["royungar/chicago_sql_data_analysis_project","royungar/sql_chicago_data_analysis_project"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/royungar/SQL_Chicago_Data_Analysis_Project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royungar%2FSQL_Chicago_Data_Analysis_Project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royungar%2FSQL_Chicago_Data_Analysis_Project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royungar%2FSQL_Chicago_Data_Analysis_Project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royungar%2FSQL_Chicago_Data_Analysis_Project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/royungar","download_url":"https://codeload.github.com/royungar/SQL_Chicago_Data_Analysis_Project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/royungar%2FSQL_Chicago_Data_Analysis_Project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33884734,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-03T02:00:06.370Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aggregation","census-data","chicago","crime-data","data-analysis","data-engineering","education-data","ibm","jupyter-notebook","pandas","sql","sqlite","subqueries"],"created_at":"2026-06-04T00:01:33.773Z","updated_at":"2026-06-04T00:01:34.599Z","avatar_url":"https://github.com/royungar.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SQL Chicago Data Analysis Project – IBM Data Engineering Professional Certificate\n\n## Overview\n\nThis project explores and analyzes real-world datasets from the City of Chicago using SQL within a Jupyter Notebook.\nIt was completed as the final project for **Course 5 – Databases and SQL for Data Science with Python**\nin the [IBM Data Engineering Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-engineer).\n\nThe analysis is performed using SQLite and `%sql` magic commands, combining Python and SQL to uncover insights about crime rates, public schools, and socioeconomic conditions in Chicago.\n\n---\n\n## Objectives\n\n- Create and connect to a SQLite database using Jupyter and SQL magic\n- Import CSV datasets into pandas and persist them as relational tables in SQLite\n- Perform SQL queries to extract insights using filtering, grouping, and aggregation\n- Apply subqueries and joins to analyze relationships across datasets\n- Investigate correlations between crime rates, community demographics, and school safety\n\n---\n\n## Tools \u0026 Technologies\n\n| Category          | Tools/Technologies             |\n|-------------------|--------------------------------|\n| **Languages**     | Python, SQL                    |\n| **Libraries**     | pandas, sqlite3, ipython-sql   |\n| **Database**      | SQLite                         |\n| **Environment**   | Jupyter Notebook               |\n| **File Formats**  | CSV                            |\n\n---\n\n## Datasets Used\n\n1. **Chicago Census Data**  \n   Socioeconomic indicators by community area (2008–2012)  \n   [Source](https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected-socioeconomic-indicators-in-C/kn9c-c2s2)\n\n2. **Chicago Public Schools Progress Report Cards**  \n   Performance and safety data (2011–2012)  \n   [Source](https://data.cityofchicago.org/Education/Chicago-Public-Schools-Progress-Report-Cards-2011-/9xs2-f89t)\n\n3. **Chicago Crime Data**  \n   Reported crimes from 2001 to present  \n   [Source](https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2)\n\n\u003e **Note:** Cleaned and reduced versions of the datasets were provided by IBM for educational use.\n\n---\n\n## Tasks Completed\n\n- Created a new SQLite database and established a connection using `%load_ext sql`\n- Loaded three CSV files into pandas DataFrames\n- Created corresponding tables in SQLite using pandas `.to_sql()` method\n- Queried data using SQL operations:\n  - Filtering with `WHERE` and `LIKE` clauses\n  - Joining and grouping data across tables\n  - Using aggregation functions (`COUNT`, `AVG`)\n  - Writing subqueries to solve analytical problems\n- Answered 10 domain-specific questions involving:\n  - Crimes involving minors and children\n  - School safety scores grouped by type\n  - Community areas with high poverty and hardship\n  - Identifying the most crime-prone neighborhood\n\n---\n\n## Sample SQL Queries\n\n```sql\n-- Total number of crimes\nSELECT COUNT(*) FROM CHICAGO_CRIME_DATA;\n\n-- Community areas with per capita income \u003c $11,000\nSELECT COMMUNITY_AREA_NAME, COMMUNITY_AREA_NUMBER\nFROM CENSUS_DATA\nWHERE PER_CAPITA_INCOME \u003c 11000;\n\n-- Crimes committed at school locations (distinct types)\nSELECT DISTINCT PRIMARY_TYPE\nFROM CHICAGO_CRIME_DATA\nWHERE LOCATION_DESCRIPTION LIKE '%SCHOOL%';\n```\n\n---\n\n## Key Insights\n\n- **Riverdale** had both the highest hardship index and highest percentage of households below the poverty line.\n- **Austin (Community Area 25)** was the most crime-prone area in the dataset.\n- Safety scores varied across school types, with high schools and elementary schools showing similar average ratings.\n\n---\n\n## Repository Structure\n\n```plaintext\nSQL_Chicago_Data_Analysis_Project/\n├── README.md                                    # Project overview and instructions\n├── data/\n│   ├── ChicagoCensusData.csv                    # Socioeconomic data by community area\n│   ├── ChicagoCrimeData.csv                     # Crime records\n│   └── ChicagoPublicSchools.csv                 # School performance and safety scores\n├── images/                                      # Screenshots of database setup and SQL query results\n│   ├── low_income_areas.png                     # Community areas with low per capita income (Problem 2)\n│   ├── most_crime_prone_area.png                # Area with the highest crime count using a subquery (Problem 10)\n│   ├── school_crimes_summary.png                # Types of crimes reported at school locations (Problem 5)\n│   ├── table_creation_and_database_setup.png    # Creating SQLite DB and loading tables using pandas and df.to_sql()\n│   └── total_crimes_count.png                   # Total number of crimes in the dataset (Problem 1)\n├── notebook/\n│   └── SQL_Chicago_Data_Analysis_Project.ipynb  # Final notebook with all tasks and query outputs\n```\n\n---\n\n## License\n\nThis project was completed as part of the IBM Data Engineering Professional Certificate and is intended for educational use.\n\n## Links\n\n- Course Page - [Databases and SQL for Data Science with Python](https://www.coursera.org/learn/sql-data-science)\n- [GitHub Profile](https://github.com/royungar)\n- [GitHub Repository](https://github.com/royungar/SQL_Chicago_Data_Analysis_Project)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyungar%2Fsql_chicago_data_analysis_project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Froyungar%2Fsql_chicago_data_analysis_project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Froyungar%2Fsql_chicago_data_analysis_project/lists"}