https://github.com/evanmathew/netflix_sql_data_analysis

This project explores the Netflix dataset using SQL to answer complex analytical questions. It involves data cleansing, aggregation, ranking, and advanced SQL techniques to uncover insights such as top-performing directors by genre, content diversity by country, yearly content trends, and more.
https://github.com/evanmathew/netflix_sql_data_analysis

case-study-project netflix postgresql sql

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/evanmathew/netflix_sql_data_analysis
Owner: evanmathew
Created: 2024-11-22T19:32:09.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-11-30T18:33:06.000Z (over 1 year ago)
Last Synced: 2025-03-02T01:37:42.448Z (over 1 year ago)
Topics: case-study-project, netflix, postgresql, sql
Homepage:
Size: 62.5 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Netflix Data Analysis

## Project Overview

This project involves analyzing data from the Netflix platform to gain insights into its content, ratings, countries, and more. The analysis includes various SQL queries on a Netflix dataset, covering a wide range of questions about Movies, TV Shows, Directors, Ratings, and Countries. This project aims to help explore patterns in content availability, director contributions, ratings distribution, and other important attributes of Netflix content.

### Key Technologies:
- **SQL**: Used for querying and analyzing data in a PostgreSQL database.
- **Data Analysis**: Conducting exploratory data analysis using SQL queries to extract valuable insights from Netflix content data.
- **PostgreSQL**: Database used for storing and querying the Netflix dataset.

## Objectives

- Analyze the distribution of content types (movies vs TV shows).
- Identify the most common ratings for movies and TV shows.
- List and analyze content based on release years, countries, and durations.
- Explore and categorize content based on specific criteria and keywords.

## SQL Queries Categories

This project contains SQL queries categorized by difficulty:

1. **Easy SQL Queries**: Basic queries for getting simple insights like total count, specific genres, and movies by year.
2. **Medium SQL Queries**: Queries that involve joins, advanced aggregations, and filtering based on multiple conditions.
3. **Difficult SQL Queries**: Advanced queries involving window functions, CTEs, and complex aggregations to extract deeper insights.

## Schema

```sql
DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
show_id VARCHAR(7),
type VARCHAR(10),
title VARCHAR(250),
director VARCHAR(550),
casts VARCHAR(1050),
country VARCHAR(550),
date_added DATE,
release_year INT,
rating VARCHAR(15),
duration VARCHAR(200),
listed_in VARCHAR(300),
description VARCHAR(550)
);
```

## Dataset

The data for this project is sourced from the Kaggle dataset:

- **Dataset Link:** [Movies Dataset](https://www.kaggle.com/datasets/shivamb/netflix-shows?resource=download)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/evanmathew/netflix_sql_data_analysis

Awesome Lists containing this project

README