Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/boemer00/netflix
We’re helping Netflix decide what content their users enjoy. By modelling a relationship between features and user scores we can predict how well-received new content will be, before spending on licences-- reducing the risk of buying dud content.
https://github.com/boemer00/netflix
data-engineering machine-learning netflix pipelines python regression scikit-learn
Last synced: about 1 month ago
JSON representation
We’re helping Netflix decide what content their users enjoy. By modelling a relationship between features and user scores we can predict how well-received new content will be, before spending on licences-- reducing the risk of buying dud content.
- Host: GitHub
- URL: https://github.com/boemer00/netflix
- Owner: boemer00
- Created: 2021-06-06T10:35:17.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2021-06-25T13:28:40.000Z (over 3 years ago)
- Last Synced: 2023-04-30T19:42:44.111Z (over 1 year ago)
- Topics: data-engineering, machine-learning, netflix, pipelines, python, regression, scikit-learn
- Language: Python
- Homepage:
- Size: 44 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Movie Score Predictor for Streaming Companies
![](/main_image.png)
# Overall
Our project helps streaming companies, such as Netflix, decide what content their users enjoy. We have built and deployed a machine learning model that identifies the relationship between features and user scores. Ultimately, it can predict how well-received new content will be, before companies spend on licences or original productions--reducing the risk of dud content.# Sources
We have extracted data from two sources:
- the [kaggle dataset](https://www.kaggle.com/netflix-inc/netflix-prize-data)
- [IMDb developer](https://developer.imdb.com/) using API requests# Machine Learning Model
We have created a pipeline which transforms raw data and fits multiple models using regression techniques. We tested both individual models (e.g. Linear Regression, Lasso, Ridge, KNN) and emsemble methods (e.g. Voting, Bagging, Stacking, Ada). Our model achieved the best result, measure by RMSE (0.3), through Gradient Boosting Regressor.------------------------------------
# Startup the project
The initial setup.
Create virtualenv and install the project:
```bash
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\
pip install pip -U; pip install -r requirements.txt
```Unittest test:
```bash
make clean install test
```Check for Netflix in gitlab.com/{group}.
If your project is not set please add it:- Create a new project on `gitlab.com/{group}/Netflix`
- Then populate it:```bash
## e.g. if group is "{group}" and project_name is "Netflix"
git remote add origin [email protected]:{group}/Netflix.git
git push -u origin master
git push -u origin --tags
```Functionnal test with a script:
```bash
cd
mkdir tmp
cd tmp
Netflix-run
```# Install
Go to `https://github.com/{group}/Netflix` to see the project, manage issues,
setup you ssh public key, ...Create a python3 virtualenv and activate it:
```bash
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv -ppython3 ~/venv ; source ~/venv/bin/activate
```Clone the project and install it:
```bash
git clone [email protected]:{group}/Netflix.git
cd Netflix
pip install -r requirements.txt
make clean install test # install and test
```
Functional test with a script:```bash
cd
mkdir tmp
cd tmp
Netflix-run
```