https://github.com/5h3r10k/cs210-final
https://github.com/5h3r10k/cs210-final
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/5h3r10k/cs210-final
- Owner: 5h3r10k
- Created: 2024-12-10T05:05:21.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2024-12-26T16:43:50.000Z (5 months ago)
- Last Synced: 2025-02-07T22:26:47.459Z (4 months ago)
- Language: Jupyter Notebook
- Size: 938 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# CS210-final
## **Movies Data Analysis and Prediction**
This project is a comprehensive analysis of a movies dataset, with visualizations and a predictive model to forecast movie revenue based on specific features. It leverages Python libraries like pandas, matplotlib, and scikit-learn to clean data, visualize trends, and build predictive models.
---
### **Project Features**
1. **Data Cleaning**:
- Removed missing or invalid entries for critical features such as `runtime`, `budget`, and `revenue`.
- Filtered out invalid release years.2. **Exploratory Data Analysis**:
- **Key visualizations**:
- **Average runtime of movies by release year**.
- **Distribution of revenue across years**.
- **Budget trends over time**.
- Used matplotlib to present trends and insights.3. **Predictive Model**:
- A **Linear Regression model** built using `scikit-learn` predicts movie revenue.
- Features used for prediction:
- `release_year`
- `runtime`
- `budget`
- `vote_average`
- Model evaluation using the R² score.4. **Prediction Example**:
- Predicted revenue for a hypothetical movie with:
- Release year: 2022
- Runtime: 120 minutes
- Budget: $10,000,000
- Vote average: 7.5---
### **Installation**
1. **Clone the repository**:
```bash
git clone
```
2. **Install required packages**:
```bash
pip install -r requirements.txt
```
Key dependencies include:
- `pandas`
- `matplotlib`
- `scikit-learn`3. **Run the Jupyter Notebook**:
```bash
jupyter notebook CS210_final.ipynb
```---
### **Usage**
1. **Run the notebook** to execute all cells sequentially for data analysis and visualization.
2. **Modify predictive features** in the model section to predict revenue for custom movie scenarios.---
### **Example Prediction**
```python
# Example features
release_year = 2022
runtime = 120
budget = 10_000_000
vote_average = 7.5# Predicted revenue
predicted_revenue = model.predict([[release_year, runtime, budget, vote_average]])
print(f"Predicted revenue: {predicted_revenue}")
```---
### **Output Highlights**
- **R² Score**: The model achieved a score of `0.5509`, indicating moderate predictive power.
- **Predicted Revenue for Example**: $26,244,374.Feel free to run and modify the notebook for further insights or enhancements to the predictive model.