Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/boemer00/netflix

We’re helping Netflix decide what content their users enjoy. By modelling a relationship between features and user scores we can predict how well-received new content will be, before spending on licences-- reducing the risk of buying dud content.
https://github.com/boemer00/netflix

data-engineering machine-learning netflix pipelines python regression scikit-learn

Last synced: 7 days ago
JSON representation

Host: GitHub
URL: https://github.com/boemer00/netflix
Owner: boemer00
Created: 2021-06-06T10:35:17.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2021-06-25T13:28:40.000Z (over 3 years ago)
Last Synced: 2024-11-22T05:14:30.086Z (2 months ago)
Topics: data-engineering, machine-learning, netflix, pipelines, python, regression, scikit-learn
Language: Python
Homepage:
Size: 44 MB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Movie Score Predictor for Streaming Companies

![](/main_image.png)

# Overall
Our project helps streaming companies, such as Netflix, decide what content their users enjoy. We have built and deployed a machine learning model that identifies the relationship between features and user scores. Ultimately, it can predict how well-received new content will be, before companies spend on licences or original productions--reducing the risk of dud content.

# Sources
We have extracted data from two sources:
- the [kaggle dataset](https://www.kaggle.com/netflix-inc/netflix-prize-data)
- [IMDb developer](https://developer.imdb.com/) using API requests

# Machine Learning Model
We have created a pipeline which transforms raw data and fits multiple models using regression techniques. We tested both individual models (e.g. Linear Regression, Lasso, Ridge, KNN) and emsemble methods (e.g. Voting, Bagging, Stacking, Ada). Our model achieved the best result, measure by RMSE (0.3), through Gradient Boosting Regressor.

------------------------------------

# Startup the project

The initial setup.

Create virtualenv and install the project:
```bash
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\
pip install pip -U; pip install -r requirements.txt
```

Unittest test:
```bash
make clean install test
```

Check for Netflix in gitlab.com/{group}.
If your project is not set please add it:

- Create a new project on `gitlab.com/{group}/Netflix`
- Then populate it:

```bash
## e.g. if group is "{group}" and project_name is "Netflix"
git remote add origin [email protected]:{group}/Netflix.git
git push -u origin master
git push -u origin --tags
```

Functionnal test with a script:

```bash
cd
mkdir tmp
cd tmp
Netflix-run
```

# Install

Go to `https://github.com/{group}/Netflix` to see the project, manage issues,
setup you ssh public key, ...

Create a python3 virtualenv and activate it:

```bash
sudo apt-get install virtualenv python-pip python-dev
deactivate; virtualenv -ppython3 ~/venv ; source ~/venv/bin/activate
```

Clone the project and install it:

```bash
git clone [email protected]:{group}/Netflix.git
cd Netflix
pip install -r requirements.txt
make clean install test # install and test
```
Functional test with a script:

```bash
cd
mkdir tmp
cd tmp
Netflix-run
```