Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/aman-dutta/case-study-accidents
Spark analysis on the accidents-data
https://github.com/aman-dutta/case-study-accidents
dataframe etl hadoop python spark
Last synced: 4 months ago
JSON representation
Spark analysis on the accidents-data
- Host: GitHub
- URL: https://github.com/aman-dutta/case-study-accidents
- Owner: aman-dutta
- Created: 2024-08-19T06:41:53.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-08-19T07:27:42.000Z (6 months ago)
- Last Synced: 2024-09-29T07:01:38.144Z (4 months ago)
- Topics: dataframe, etl, hadoop, python, spark
- Language: Python
- Homepage:
- Size: 11.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PySpark Crash Analysis
This repository contains a PySpark project for analyzing crash data. The project includes various analyses using data from CSV files.
## Project Structure
- `utils.py`: Contains utility functions, including the `read_csv` function for reading CSV files into Spark DataFrames.
- `main.py`: Contains the main analysis code, including data loading, transformations, and computations.
- `data/`: Directory containing CSV files used for analysis.## Prerequisites
- Python
- Apache Spark
- PySpark
- Required Python packages (listed in `requirements.txt`)## Setup
1. **Clone the Repository**
```bash
git clone https://github.com/yourusername/your-repository.git
cd your-repository## Analytics
- `Analysis 1`: Find the number of crashes (accidents) in which number of males killed are greater than 2?
- `Analysis 2`: How many two wheelers are booked for crashes?
- `Analysis 3`: Determine the Top 5 Vehicle Makes of the cars present in the crashes in which driver died and Airbags did not deploy.
- `Analysis 4`: Determine number of Vehicles with driver having valid licences involved in hit and run?
- `Analysis 5`: Which state has highest number of accidents in which females are not involved?
- `Analysis 6`: Which are the Top 3rd to 5th VEH_MAKE_IDs that contribute to a largest number of injuries including death
- `Analysis 7`: For all the body styles involved in crashes, mention the top ethnic user group of each unique body style
- `Analysis 8`: Among the crashed cars, what are the Top 5 Zip Codes with highest number crashes with alcohols as the contributing factor to a crash (Use Driver Zip Code)
- `Analysis 9`: Count of Distinct Crash IDs where No Damaged Property was observed and Damage Level (VEH_DMAG_SCL~) is above 4 and car avails Insurance
- `Analysis 10`: Determine the Top 5 Vehicle Makes where drivers are charged with speeding related offences, has licensed Drivers, used top 10 used vehicle colours and has car licensed with the Top 25 states with highest number of offences (to be deduced from the data)