Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/satvikpraveen/rsvp_case_study

A comprehensive IMDB dataset analysis using SQL. Includes database setup, advanced queries, and actionable insights. Organized with files for database creation, queries, and solutions. Features an Entity-Relationship Diagram (ERD), executive summary, and SQL scripts. Perfect for SQL workflows and business intelligence in the film industry.
https://github.com/satvikpraveen/rsvp_case_study

aggregate-functions business-intelligence common-table-expressions data-analysis data-driven-decisions data-querying database-design entity-relationship-diagram imdb-dataset relational-database sql subqueries-and-joins

Last synced: 6 days ago
JSON representation

A comprehensive IMDB dataset analysis using SQL. Includes database setup, advanced queries, and actionable insights. Organized with files for database creation, queries, and solutions. Features an Entity-Relationship Diagram (ERD), executive summary, and SQL scripts. Perfect for SQL workflows and business intelligence in the film industry.

Awesome Lists containing this project

README

        

# **IMDB Data Analysis Using SQL**

## **Overview**
This repository contains a comprehensive analysis of IMDB movie data using SQL. The analysis is designed to extract meaningful insights to assist in strategic decision-making for a global film production initiative. The project leverages structured data and SQL expertise to derive actionable intelligence from the IMDB dataset.

This work demonstrates expertise in SQL, data analysis, and business intelligence, making it a valuable addition to your pinned projects on GitHub.

---

## **Key Features**
- **Detailed Data Analysis**: Analysis performed on a dataset spanning three years, covering multiple attributes such as genres, directors, actors, production companies, and movie ratings.
- **Actionable Insights**: Extracted critical patterns and trends to support data-driven decision-making in the entertainment industry.
- **Structured Process**: The analytics process is divided into logical segments, each contributing to a thorough understanding of the dataset.
- **Scalable Design**: The project framework and SQL scripts are designed for scalability and adaptability to other datasets.

---

## **Dataset Description**
The IMDB dataset includes information about movies produced over the past three years. The data spans attributes such as:
- Movie titles, genres, and production years
- Ratings and user votes
- Directors, actors, and production companies
- Movie durations and release months

---

## **Highlights**
### **Insights Derived**
- Most movies were produced in 2017, with March being the most active production month.
- The **Drama** genre dominates in terms of the number of movies produced, while **Action** movies have the longest average duration.
- Top-performing directors, production companies, and actors were identified based on ratings and other metrics.
- Key global production insights, including:
- Dream Warrior Pictures and National Theater Live have produced the most hit movies.
- Marvel Studios leads in votes received for its productions.

### **Advanced SQL Techniques Used**
- Complex **JOINs** for multi-table analysis
- Use of **aggregate functions** (e.g., `AVG`, `COUNT`, `SUM`) to summarize data
- Advanced filtering with **subqueries** and **CTEs (Common Table Expressions)**
- Comprehensive **GROUP BY** and **HAVING** clauses for detailed breakdowns
- Analytical queries for ranking and trend detection

---

## **Repository Structure**
```
├── IMDB_dataset_import.sql # SQL script to create and populate the database
├── IMDB_question.sql # SQL queries to address specific analytical objectives
├── Solution_IMDB_questions.sql # Consolidated solutions for the analytical tasks
├── IMDb_movies_Data_and_ERD.xlsx # Dataset and Entity-Relationship Diagram
├── Executive_Summary.pdf # Business insights and findings
└── README.md # Project documentation
```

---

## **Usage**
1. **Database Setup**:
- Use `IMDB_dataset_import.sql` to create and populate the database.
2. **Analysis**:
- Execute `IMDB_question.sql` or `Solution_IMDB_questions.sql` to replicate the analysis and derive insights.
3. **Review Results**:
- Refer to `Executive_Summary.pdf` for a summary of findings.

---

## **ERD Overview**
The Entity-Relationship Diagram (ERD) provides a clear representation of database relationships, making it easier to understand the dataset's structure and logical flow.

---

## **Key Business Insights**
- Full list of insights provided in [`Executive_Summary.pdf`](./Executive_Summary.pdf)

---

## **Technologies Used**
- **SQL**: Primary language for data analysis
- **RDBMS**: Tested on MySQL, compatible with other relational database systems
- **Excel**: For dataset visualization and ERD creation

---

## **Future Scope**
- Integration with visualization tools like Tableau or Power BI for advanced analytics dashboards
- Incorporation of machine learning techniques for predictive modeling
- Expansion to include audience sentiment analysis and box-office trends

---

## **Author**
- **Satvik Praveen**
Data Science Enthusiast | [LinkedIn](https://www.linkedin.com/in/satvikpraveen/)