https://github.com/saurabh9136/netflix_project
https://github.com/saurabh9136/netflix_project
postgresql sql
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/saurabh9136/netflix_project
- Owner: saurabh9136
- Created: 2025-03-05T11:15:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-05T14:55:22.000Z (over 1 year ago)
- Last Synced: 2025-03-05T15:35:56.985Z (over 1 year ago)
- Topics: postgresql, sql
- Language: Python
- Homepage:
- Size: 1.51 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Netflix Data Analysis Projects

## 1. Netflix Advance SQL Analysis
This project analyzes Netflix data using SQL queries to extract insights about movies, genres, and user interactions.
**Dataset Used:** `netflix_titles.csv`
## Schema
```sql
DROP TABLE IF EXISTS netflix;
CREATE TABLE netflix
(
show_id VARCHAR(5),
type VARCHAR(10),
title VARCHAR(250),
director VARCHAR(550),
casts VARCHAR(1050),
country VARCHAR(550),
date_added VARCHAR(55),
release_year INT,
rating VARCHAR(15),
duration VARCHAR(15),
listed_in VARCHAR(250),
description VARCHAR(550)
);
```
## Business Problems and Solutions
### 1. Count the Number of Movies vs TV Shows
```sql
SELECT
type,
COUNT(*)
FROM netflix
GROUP BY 1;
```
**Objective:** Determine the distribution of content types on Netflix.
### 2. Find the Most Common Rating for Movies and TV Shows
```sql
WITH RatingCounts AS (
SELECT
type,
rating,
COUNT(*) AS rating_count
FROM netflix
GROUP BY type, rating
),
RankedRatings AS (
SELECT
type,
rating,
rating_count,
RANK() OVER (PARTITION BY type ORDER BY rating_count DESC) AS rank
FROM RatingCounts
)
SELECT
type,
rating AS most_frequent_rating
FROM RankedRatings
WHERE rank = 1;
```
**Objective:** Identify the most frequently occurring rating for each type of content.
### 3. List All Movies Released in a Specific Year (e.g., 2020)
```sql
SELECT *
FROM netflix
WHERE release_year = 2020;
```
**Objective:** Retrieve all movies released in a specific year.
### 4. Find the Top 5 Countries with the Most Content on Netflix
```sql
SELECT *
FROM
(
SELECT
UNNEST(STRING_TO_ARRAY(country, ',')) AS country,
COUNT(*) AS total_content
FROM netflix
GROUP BY 1
) AS t1
WHERE country IS NOT NULL
ORDER BY total_content DESC
LIMIT 5;
```
**Objective:** Identify the top 5 countries with the highest number of content items.
### 5. Identify the Longest Movie
```sql
SELECT
*
FROM netflix
WHERE type = 'Movie'
ORDER BY SPLIT_PART(duration, ' ', 1)::INT DESC;
```
**Objective:** Find the movie with the longest duration.
### 6. Find Content Added in the Last 5 Years
```sql
SELECT *
FROM netflix
WHERE TO_DATE(date_added, 'Month DD, YYYY') >= CURRENT_DATE - INTERVAL '5 years';
```
**Objective:** Retrieve content added to Netflix in the last 5 years.
### 7. Find All Movies/TV Shows by Director 'Rajiv Chilaka'
```sql
SELECT *
FROM (
SELECT
*,
UNNEST(STRING_TO_ARRAY(director, ',')) AS director_name
FROM netflix
) AS t
WHERE director_name = 'Rajiv Chilaka';
```
**Objective:** List all content directed by 'Rajiv Chilaka'.
### 8. List All TV Shows with More Than 5 Seasons
```sql
SELECT *
FROM netflix
WHERE type = 'TV Show'
AND SPLIT_PART(duration, ' ', 1)::INT > 5;
```
**Objective:** Identify TV shows with more than 5 seasons.
### 9. Count the Number of Content Items in Each Genre
```sql
SELECT
UNNEST(STRING_TO_ARRAY(listed_in, ',')) AS genre,
COUNT(*) AS total_content
FROM netflix
GROUP BY 1;
```
**Objective:** Count the number of content items in each genre.
### 10.Find each year and the average numbers of content release in India on netflix.
return top 5 year with highest avg content release!
```sql
SELECT
country,
release_year,
COUNT(show_id) AS total_release,
ROUND(
COUNT(show_id)::numeric /
(SELECT COUNT(show_id) FROM netflix WHERE country = 'India')::numeric * 100, 2
) AS avg_release
FROM netflix
WHERE country = 'India'
GROUP BY country, release_year
ORDER BY avg_release DESC
LIMIT 5;
```
**Objective:** Calculate and rank years by the average number of content releases by India.
### 11. List All Movies that are Documentaries
```sql
SELECT *
FROM netflix
WHERE listed_in LIKE '%Documentaries';
```
**Objective:** Retrieve all movies classified as documentaries.
### 12. Find All Content Without a Director
```sql
SELECT *
FROM netflix
WHERE director IS NULL;
```
**Objective:** List content that does not have a director.
### 13. Find How Many Movies Actor 'Salman Khan' Appeared in the Last 10 Years
```sql
SELECT *
FROM netflix
WHERE casts LIKE '%Salman Khan%'
AND release_year > EXTRACT(YEAR FROM CURRENT_DATE) - 10;
```
**Objective:** Count the number of movies featuring 'Salman Khan' in the last 10 years.
### 14. Find the Top 10 Actors Who Have Appeared in the Highest Number of Movies Produced in India
```sql
SELECT
UNNEST(STRING_TO_ARRAY(casts, ',')) AS actor,
COUNT(*)
FROM netflix
WHERE country = 'India'
GROUP BY actor
ORDER BY COUNT(*) DESC
LIMIT 10;
```
**Objective:** Identify the top 10 actors with the most appearances in Indian-produced movies.
### 15. Categorize Content Based on the Presence of 'Kill' and 'Violence' Keywords
```sql
SELECT
category,
COUNT(*) AS content_count
FROM (
SELECT
CASE
WHEN description ILIKE '%kill%' OR description ILIKE '%violence%' THEN 'Bad'
ELSE 'Good'
END AS category
FROM netflix
) AS categorized_content
GROUP BY category;
```