https://github.com/swethajoseph/crime-pattern-analysis-project
Analysis and visualization of open-source police data from two areas, Leicestershire Street and Northumbria Street to derive data-driven insights
https://github.com/swethajoseph/crime-pattern-analysis-project
apachespark datamanipulation datapreprocessing datavisualization exploratory-data-analysis jupyter-notebook pyspark python sql-query
Last synced: 17 days ago
JSON representation
Analysis and visualization of open-source police data from two areas, Leicestershire Street and Northumbria Street to derive data-driven insights
- Host: GitHub
- URL: https://github.com/swethajoseph/crime-pattern-analysis-project
- Owner: SwethaJoseph
- Created: 2024-07-08T15:23:17.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-08T15:43:01.000Z (almost 2 years ago)
- Last Synced: 2025-07-09T17:08:50.987Z (11 months ago)
- Topics: apachespark, datamanipulation, datapreprocessing, datavisualization, exploratory-data-analysis, jupyter-notebook, pyspark, python, sql-query
- Homepage:
- Size: 2.78 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Crime-Pattern-Analysis-Project
## Overview
This project involves the analysis and visualization of open-source police data from two areas, Leicestershire Street and Northumbria Street, for the month of March 2021. The analysis utilizes Apache Spark SQL for data cleansing, configuration, and pre-processing. Insights are visualized using various graphs and charts to depict crime patterns and their impacts on public safety.
## Key Technologies
* Apache Spark SQL: Used for data processing and querying.
* Python (PySpark, matplotlib, pandas): For data manipulation and visualization.
* Jupyter Notebook: The environment for running and documenting the analysis.
## Datasets
* Leicestershire Street Data: Contains crime records for March 2021.
* Northumbria Street Data: Contains crime records for March 2021.
Both datasets are sourced from data.police.uk.
## Analysis Steps
* Environment Setup: Installation and configuration of Jupyter Notebook and necessary Python libraries.
* Data Cleaning and Transformation: Removing or rectifying incorrect, inaccurate, or missing data, and transforming data into suitable formats.
* Exploratory Data Analysis: Using SQL queries and Python functions to gain insights into crime patterns.
* Visualization: Creating bar charts, pie charts, and maps to pictorially represent the data.
## Key Crime Insights
* Crime Types: Leicestershire sees more "Violence and sexual offences", Northumbria more "Anti-social behaviour".
* Geographic Influence: Crime rates and types vary significantly by location.
* Investigation Outcomes: Many cases in Leicestershire are unresolved; Northumbria often has no suspect identified.
* Population Density: Northumbria has higher "Anti-social behaviour" rates despite lower population density.
* Data Gaps: Missing data affects the completeness of the analysis.