https://github.com/hridya2001/cinema-on-cloud
Cloud-powered SQL project using IMDb data – cleaned locally, stored in AWS RDS, and explored via DBeaver.
https://github.com/hridya2001/cinema-on-cloud
aws-ec2-intances aws-rds dbeaver mysql-database
Last synced: about 18 hours ago
JSON representation
Cloud-powered SQL project using IMDb data – cleaned locally, stored in AWS RDS, and explored via DBeaver.
- Host: GitHub
- URL: https://github.com/hridya2001/cinema-on-cloud
- Owner: Hridya2001
- License: mit
- Created: 2025-06-16T08:10:33.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-16T09:28:57.000Z (4 months ago)
- Last Synced: 2025-06-16T09:34:23.769Z (4 months ago)
- Topics: aws-ec2-intances, aws-rds, dbeaver, mysql-database
- Homepage:
- Size: 6.84 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# IMDB to Cloud: A Casual SQL Journey ...
Ever wondered what the most loved movies of the last decade are?
In this mini project, I took the massive IMDb dataset from [Kaggle](https://www.kaggle.com/datasets/ashirwadsangwan/imdb-dataset), filtered out the noise, ran it through a cloud-powered MySQL setup, and surfaced the highest-voted titles from the last 10 years. All using a mix of SQL, AWS, and some trial-and-error magic.
---
## Architecture Overview
Here’s a high-level view of how everything is wired together:

---
## What’s This Project About?
- Downloaded IMDb datasets from Kaggle (a huge one with lots of different tables).
- Loaded it into **local MySQL** and cleaned,modified it.
- Focused only on two specific tables:
- `movie_title` (which I customized),
- and `title_rating`.
- Joined these two to create a new table called `high_rated_titles`.And then I thought... why keep it local?
---
## Moving to the Cloud
To make things more "cloudy":
- Set up an **EC2 instance** and **RDS MySQL** database inside a **custom VPC with subnets**.
- Transferred the local database to RDS.
- Created an **SSH tunnel** so I could connect DBeaver with RDS.> Honestly though, I later realized I could’ve done everything through my terminal too. So, the DBeaver setup was more for convenience.
---
## Tech Stack
- **Language:** SQL
- **Tools:** MySQL, AWS RDS, EC2, VPC, Linux terminal, DBeaver
- **Data Source:** [IMDb Dataset on Kaggle](https://www.kaggle.com/datasets/ashirwadsangwan/imdb-dataset)---
## Final Output
Using SQL queries on the cloud-hosted DB, I filtered out:
> **Movie Titles**,
> **Release Year**,
> **Number of Votes**…for the most voted titles released during the **last decade** (2015–2024).
---
## All Commands & Queries
Curious about the exact steps and SQL magic behind this project?
Check out [`commands_and_queries.md`](Code)
This file includes:
- All **MySQL queries** (table creation, joins, filters, etc.)
- Commands to **set up AWS services** – RDS, EC2, and VPC
- Steps to **create an SSH tunnel** from local to RDS
- How I installed and used **DBeaver** on my Ubuntu machine---
## DBeaver Exploration
Connected the cloud-hosted MySQL DB to DBeaver using an SSH tunnel. Ran the final query and viewed the results in a nice tabular format.
Here’s a peek:

---
## Why I Did This?
Just wanted to get hands-on with:
- Real-world SQL on large datasets.
- Transferring MySQL databases from local to cloud (EC2 + RDS).
- Using DBeaver for database inspection and query writing.
- And of course, understanding **how to filter meaningful insights from massive data**.---
## Learnings & Reflections
- SQL can be powerful and fun when you know what you're digging for.
- DBeaver is cool, but not a must. The terminal works just fine too.
- Moving DBs to the cloud is easier than I thought—but getting all the VPC, subnet, and security group configs right takes some trial and error!---
## Next Steps?
Maybe try visualizing the results using tools like Power BI, Superset, or even Python dashboards. Or build a Streamlit app on top of it… who knows!
Thanks for reading!