https://github.com/evanmathew/netflix_sql_data_analysis
This project explores the Netflix dataset using SQL to answer complex analytical questions. It involves data cleansing, aggregation, ranking, and advanced SQL techniques to uncover insights such as top-performing directors by genre, content diversity by country, yearly content trends, and more.
https://github.com/evanmathew/netflix_sql_data_analysis
case-study-project netflix postgresql sql
Last synced: 4 months ago
JSON representation
This project explores the Netflix dataset using SQL to answer complex analytical questions. It involves data cleansing, aggregation, ranking, and advanced SQL techniques to uncover insights such as top-performing directors by genre, content diversity by country, yearly content trends, and more.
- Host: GitHub
- URL: https://github.com/evanmathew/netflix_sql_data_analysis
- Owner: evanmathew
- Created: 2024-11-22T19:32:09.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-30T18:33:06.000Z (over 1 year ago)
- Last Synced: 2025-03-02T01:37:42.448Z (over 1 year ago)
- Topics: case-study-project, netflix, postgresql, sql
- Homepage:
- Size: 62.5 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Netflix Data Analysis
## Project Overview
This project involves analyzing data from the Netflix platform to gain insights into its content, ratings, countries, and more. The analysis includes various SQL queries on a Netflix dataset, covering a wide range of questions about Movies, TV Shows, Directors, Ratings, and Countries. This project aims to help explore patterns in content availability, director contributions, ratings distribution, and other important attributes of Netflix content.
### Key Technologies:
- **SQL**: Used for querying and analyzing data in a PostgreSQL database.
- **Data Analysis**: Conducting exploratory data analysis using SQL queries to extract valuable insights from Netflix content data.
- **PostgreSQL**: Database used for storing and querying the Netflix dataset.
## Objectives
- Analyze the distribution of content types (movies vs TV shows).
- Identify the most common ratings for movies and TV shows.
- List and analyze content based on release years, countries, and durations.
- Explore and categorize content based on specific criteria and keywords.
## SQL Queries Categories
This project contains SQL queries categorized by difficulty:
1. **Easy SQL Queries**: Basic queries for getting simple insights like total count, specific genres, and movies by year.
2. **Medium SQL Queries**: Queries that involve joins, advanced aggregations, and filtering based on multiple conditions.
3. **Difficult SQL Queries**: Advanced queries involving window functions, CTEs, and complex aggregations to extract deeper insights.
## Schema
```sql
DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
show_id VARCHAR(7),
type VARCHAR(10),
title VARCHAR(250),
director VARCHAR(550),
casts VARCHAR(1050),
country VARCHAR(550),
date_added DATE,
release_year INT,
rating VARCHAR(15),
duration VARCHAR(200),
listed_in VARCHAR(300),
description VARCHAR(550)
);
```
## Dataset
The data for this project is sourced from the Kaggle dataset:
- **Dataset Link:** [Movies Dataset](https://www.kaggle.com/datasets/shivamb/netflix-shows?resource=download)