{"id":23084656,"url":"https://github.com/eesunmoon/db-nypd","last_synced_at":"2026-05-05T20:34:42.417Z","repository":{"id":267544748,"uuid":"901587069","full_name":"EesunMoon/DB-NYPD","owner":"EesunMoon","description":"[Project] Database ETL of NYPD","archived":false,"fork":false,"pushed_at":"2024-12-14T04:02:45.000Z","size":2824,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-03T14:46:29.567Z","etag":null,"topics":["data-preprocessing","database","er-diagrams","etl","mysql","postgresql","sql"],"latest_commit_sha":null,"homepage":"","language":"PLpgSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EesunMoon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-11T00:02:16.000Z","updated_at":"2025-01-30T05:23:09.000Z","dependencies_parsed_at":null,"dependency_job_id":"0c4bed25-73aa-4b41-8a9d-844a34797bf0","html_url":"https://github.com/EesunMoon/DB-NYPD","commit_stats":null,"previous_names":["eesunmoon/db-nypd"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EesunMoon%2FDB-NYPD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EesunMoon%2FDB-NYPD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EesunMoon%2FDB-NYPD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EesunMoon%2FDB-NYPD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EesunMoon","download_url":"https://codeload.github.com/EesunMoon/DB-NYPD/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EesunMoon%2FDB-NYPD/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259081007,"owners_count":22802399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-preprocessing","database","er-diagrams","etl","mysql","postgresql","sql"],"created_at":"2024-12-16T16:41:49.644Z","updated_at":"2026-05-05T20:34:42.378Z","avatar_url":"https://github.com/EesunMoon.png","language":"PLpgSQL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Database ETL of NYC Crime Analysis\n\n## Project Overview\nThis project involves the development of a comprehensive database system to analyze NYC crime data. The primary objective was to integrate and process real-world public datasets into a structured format for deeper insights into crime patterns and stakeholder involvement. Utilizing PostgreSQL for data management and relational mapping, we implemented a robust ETL (Extract, Transform, Load) pipeline, designed a custom ER diagram, and developed a queryable database schema.\n\n---\n\n## E/R Diagram - final\n![Project1 - Part3](https://github.com/user-attachments/assets/e4c3b20f-051b-432a-9f9b-ebc9a328bd0d)\n\n## Key Features\n### 1. Data Integration and Preprocessing\n- Integrated two large-scale public datasets:\n  - [NYPD Calls for Service](https://data.cityofnewyork.us/Public-Safety/NYPD-Calls-for-Service-Year-to-Date-/n2zq-pubd/about_data) (5.43M rows, 18 columns)\n  - [NYPD Complaint Data Historic](https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Historic/qgea-i56i/about_data) (8.91M rows, 35 columns)\n- Preprocessed and resolved inconsistencies in raw datasets, reducing redundant information while ensuring alignment with database constraints.\n\n### 2. Schema Design and Implementation\n- Designed an enhanced ER diagram with entities such as `Crime_Scene`, `Incident`, `NYPD`, `Dispatch_Duration`, and relationships like `Monitor`, `Send`, `Arrive`, and `Occurred`.\n- Integrated ISA hierarchies for the `NYPD` entity (`Transit_Police` and `Precinct`) to model specialization and total participation constraints.\n- Mapped the ER diagram into a normalized PostgreSQL schema.\n\n### 3. Advanced Features\n- Implemented triggers to enforce participation and integrity constraints (e.g., ensuring `Crime_Scene` is linked to a valid `Incident`).\n- Added array attributes to store multi-dimensional contact information (e.g., sub-officer details for `Transit_Police`).\n- Integrated a text search feature for court reviews using the `TEXT` data type.\n\n### 4. Query Development\n- Designed and executed 10+ complex SQL queries to analyze:\n  - Crime patterns by location and time.\n  - Victim demographics by race and gender.\n  - Stakeholder (hospital and court) involvement in crime responses.\n\n---\n\n## Details\n\n### Project 1: Database Design and Implementation\n\n#### Part 1: Proposal\n- **Description**: Outlines the objectives, scope, and methodology for building the database.\n- **Key Contents**:\n  - Problem statement and data sources.\n  - Initial ER diagram design.\n  - Project goals and expected outcomes.\n\n#### Part 2: Mapping ER Diagram to SQL Schema\n- **Description**: Maps the initial ER diagram into a SQL schema, defining the tables and their relationships in PostgreSQL.\n- **Key Contents**:\n  - SQL scripts for creating tables and relationships.\n  - Definitions of data types and constraints.\n\n#### Part 3: Expanded ER Diagram and Mapping to SQL Schema\n- **Description**: Includes an updated ER diagram with additional entities and relationships, reflected in an enhanced SQL schema.\n- **Key Contents**:\n  - Extended ER diagram for additional requirements.\n  - SQL scripts for the enhanced schema with new constraints and relationships.\n\n---\n\n### Project 2: Advanced Features and Optimizations\n\n#### Adding Assertions\n- **Description**: Implements assertions in PostgreSQL to enforce data integrity and business rules.\n- **Key Contents**:\n  - SQL assertions to validate data across tables.\n  - Examples include enforcing valid date ranges or attribute constraints.\n\n#### Using Array Data Structures\n- **Description**: Enhances the schema by incorporating PostgreSQL array data types for efficient multi-dimensional data storage.\n- **Key Contents**:\n  - Examples of array attributes in tables.\n  - Queries demonstrating manipulation and retrieval of array data.\n\n---\n\n## Technical Details\n- **Database Management System**: PostgreSQL\n- **Entities and Relationships**: Modeled real-world scenarios with a revised ER diagram incorporating feedback and stakeholder considerations.\n- **Triggers and Constraints**:\n  - Ensured database integrity with custom triggers for participation and ISA constraints.\n  - Implemented cascading updates and deletions for hierarchical data models.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feesunmoon%2Fdb-nypd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feesunmoon%2Fdb-nypd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feesunmoon%2Fdb-nypd/lists"}