https://github.com/th-blitz/road-risk-and-fatality-analysis
Big Data Analytics Project, Road risk and Fatality Analysis using the FARS data https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
https://github.com/th-blitz/road-risk-and-fatality-analysis
Last synced: 5 months ago
JSON representation
Big Data Analytics Project, Road risk and Fatality Analysis using the FARS data https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
- Host: GitHub
- URL: https://github.com/th-blitz/road-risk-and-fatality-analysis
- Owner: th-blitz
- License: mit
- Created: 2024-12-11T12:52:53.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-13T16:23:29.000Z (over 1 year ago)
- Last Synced: 2025-04-01T09:59:56.667Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 1 MB
- Stars: 0
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Road-Risk-and-Fatality-Analysis
Big Data Analytics Project, Road risk and Fatality Analysis using the FARS data https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
## Dataset Source
```
https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
```
## Before running, please make sure to ingest the dataset to your HDFS on Dataproc
Follow these steps to properly ingest data to your Hadoop Distributed File System on NYU Dataproc
```
https://sites.google.com/nyu.edu/nyu-hpc/training-support/general-hpc-topics/hadoop-user-guide#h.mblzaebj2gac
```
## Running the project on NYU Dataproc
SSH to your NYU Dataproc account with gcloud SDK
```
gcloud compute ssh nyu-dataproc-m --project=hpc-dataproc-19b8 --zone=us-central1-f
```
**Note** : This requires SSO authentication with your NYU Net ID
Then run a jupyter notebook session
```
jupyter notebook
```
Open a new terminal and create an SSH tunnel linking the jupyter notebook's session's localhost **PORT**, by changing the **PORT** below,
```
gcloud compute ssh nyu-dataproc-m --project hpc-dataproc-19b8 --zone us-central1-f -- -N -L PORT:localhost:PORT
```
Now, navigate to localhost:PORT in your browser on the local device to access the notebook session running on Dataproc, open the notebook and run your code.