https://github.com/darrendavy12/noaa-storm-events-analysis-postgresql-database
This project builds a PostgreSQL database to store and analyze the NOAA Storm Events Database
https://github.com/darrendavy12/noaa-storm-events-analysis-postgresql-database
datacleaning dataengineering error-handling etl etl-pipeline pgadmin4 postgresql psql python python-script query-builder sql troubleshooting
Last synced: 2 months ago
JSON representation
This project builds a PostgreSQL database to store and analyze the NOAA Storm Events Database
- Host: GitHub
- URL: https://github.com/darrendavy12/noaa-storm-events-analysis-postgresql-database
- Owner: DarrenDavy12
- Created: 2025-05-25T13:45:22.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-31T15:29:05.000Z (11 months ago)
- Last Synced: 2026-01-28T19:54:27.415Z (5 months ago)
- Topics: datacleaning, dataengineering, error-handling, etl, etl-pipeline, pgadmin4, postgresql, psql, python, python-script, query-builder, sql, troubleshooting
- Homepage:
- Size: 39.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Building-a-PostgreSQL-Database-for-NOAA-Storm-Events-Analysis
## Project Overview
This project builds a PostgreSQL database to store and analyze the NOAA Storm Events Database, which contains records of severe weather events in the United States. The project demonstrates key data engineering skills, including database design, data ingestion with Python, SQL querying, and query optimization.
## Dataset
The dataset is sourced from the NOAA Storm Events Database. It includes details on storm events (e.g., type, dates, damages) and related location data.
## Setup Instructions
### 1. Created Database within pgAdmin4 (postgreSQL) : Run CREATE DATABASE storm_events_db; in psql or pgAdmin.

---
### 2. Created tables in the 'storm_events_db' database by running:
#### storms table
`CREATE TABLE storms (
storm_id SERIAL PRIMARY KEY,
event_type VARCHAR(50),
start_date DATE,
end_date DATE,
damage_property NUMERIC,
damage_crops NUMERIC
);`
#### locations table
`CREATE TABLE locations (
location_id SERIAL PRIMARY KEY,
storm_id INTEGER REFERENCES storms(storm_id),
state VARCHAR(50),
county VARCHAR(100),
latitude NUMERIC,
longitude NUMERIC
);`
#### fatalities table
`CREATE TABLE fatalities (
fatality_id SERIAL PRIMARY KEY,
storm_id INTEGER REFERENCES storms(storm_id),
number_of_fatalities INTEGER
);`

---
### 3.Installed Python Libraries:
Ran the command: `pip install pandas psycopg2-binary` inside VScode terminal.

---
### 4. Data Ingestion: Downloaded CSV files from NOAA's FTP page.
#### Wrote three python scripts for the three tables created in vscode with errors included.
- storms table


- locations table


---
- fatalities table

### 5. Verifying postgreSQL setup
First I logged into psql (sql shell) and ran these commands:
`\1 -- listed databases inside postgres`
`\dt -- listed tables inside of storm_events_db`
`SELECT * FROM storms LIMIT 5; -- selected the first five rows from the details(storms) table`

### 6. Ran Scripts:
#### - Execute create_tables.sql to set up the database schema.
#### - Ran load_data.py for all three table with their individual scripts to load the CSV data into PostgreSQL.
##### - storm table

##### - ran a query in pgAdmin for the storms table

##### - locations table

##### - ran a query in pgAdmin for the locations table
##### - fatalities table

##### - ran a query in pgAdmin for the fatalities table

---
#### Next was querying the 'storm' table in which I ran into an error, these are steps I took (below).
### Troubleshooting error:
#### - Checked error logfile and saw that a duplicate of EVENT_ID column as a unique key was being added to the already created EVENT_ID.

#### - So I had to run the command in psql:
`TRUNCATE TABLE storms RESTART IDENTITY;` -- removes all rows from the storms table

#### - Running the 'TRUNCATE' command resolved the issue and I verified the data by running these sql commands inside pgAdmin:
`SELECT COUNT(*) FROM storms; -- Check total rows`
`SELECT * FROM storms LIMIT 5; -- View sample data`

---

## Sample Queries
### Top 10 states with most storm events:
`SELECT state, COUNT(*) as total_events
FROM locations
GROUP BY state
ORDER BY total_events DESC
LIMIT 10;`
### Total damage by event type:
`SELECT event_type, SUM(damage_property + damage_crops) as total_damage
FROM storms
GROUP BY event_type
ORDER BY total_damage DESC;`
### Number of storms per year:
`SELECT EXTRACT(YEAR FROM start_date) as year, COUNT(*) as total_storms
FROM storms
GROUP BY year`
ORDER BY year;
#### I created index's, for faster date-based queries and joins in locations.
