https://github.com/feder-cr/datascienceproject
https://github.com/feder-cr/datascienceproject
Last synced: 6 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/feder-cr/datascienceproject
- Owner: feder-cr
- Created: 2024-02-11T18:13:01.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-15T20:58:19.000Z (almost 2 years ago)
- Last Synced: 2025-06-23T04:35:31.344Z (7 months ago)
- Language: Jupyter Notebook
- Size: 348 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
University project of the course "INTRODUZIONE ALLA DATA SCIENCE" of the computer science university of Genoa
# Streaming Platform Data Analysis
## Key Features
### Objectives
- Perform comparative analysis between Netflix and Disney+ platforms
- Apply full data science pipeline from data collection to insights
- Hands-on practice of data science concepts and techniques
### Datasets
- Disney and Netflix title datasets
- Metadata like title, type, date added, genre etc.
- Enrichment data like IMDB, TMDB scores
- 4 datasets integrated provide multifaceted view
### Data Loading and Preparation
- Importing CSV datasets into dataframes
- Combining titles, enrichment and country datasets
- Handling missing values and duplicate rows
- Data transformations for analysis suitability
- Extracting year/month from dates
- Determining number of genres
### Exploratory Data Analysis
- Statistical summaries of key variables
- Analysis across dimensions like certification, type etc.
- Comparative analysis across the platforms
- Hypothesis testing for distribution differences
### Multidimensional Analysis
- Constructed OLAP cube with dimensions:
- Month
- Content type
- Production country
- Slicing, filtering and aggregation capabilities
### Predictive Modeling
- Developed classification model using Logistic Regression
- Predict content type - movie or TV show
- Features: popularity, ratings, metadata
- 85% accuracy on test data
### Interactive Visualizations
- Graphics for distributions, comparisons and trends
- Dashboards for slicing OLAP cube on multiple axes
## Key Technologies
- Python
- Jupyter Notebook
- Pandas
- Numpy
- Scikit-Learn
- Matplotlib
- Seaborn
## What I Learned
Importing, cleaning and transforming medium-size datasets
Identifying optimal data formats and structures
Applying multivariate analysis techniques
Training and evaluating classification models
Using visualizations to extract insights