{"id":21358148,"url":"https://github.com/boemer00/netflix","last_synced_at":"2026-05-20T14:04:37.142Z","repository":{"id":49164535,"uuid":"374332126","full_name":"boemer00/Netflix","owner":"boemer00","description":"We’re helping Netflix decide what content their users enjoy. By modelling a relationship between features and user scores we can predict how well-received new content will be, before spending on licences-- reducing the risk of buying dud content.","archived":false,"fork":false,"pushed_at":"2021-06-25T13:28:40.000Z","size":46109,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-22T18:32:52.187Z","etag":null,"topics":["data-engineering","machine-learning","netflix","pipelines","python","regression","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/boemer00.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-06-06T10:35:17.000Z","updated_at":"2024-10-10T12:55:37.000Z","dependencies_parsed_at":"2022-07-30T16:18:49.504Z","dependency_job_id":null,"html_url":"https://github.com/boemer00/Netflix","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boemer00%2FNetflix","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boemer00%2FNetflix/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boemer00%2FNetflix/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/boemer00%2FNetflix/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/boemer00","download_url":"https://codeload.github.com/boemer00/Netflix/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243830956,"owners_count":20354856,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","machine-learning","netflix","pipelines","python","regression","scikit-learn"],"created_at":"2024-11-22T05:14:37.362Z","updated_at":"2026-05-20T14:04:37.092Z","avatar_url":"https://github.com/boemer00.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Movie Score Predictor for Streaming Companies\n\n![](/main_image.png)\n\n# Overall\nOur project helps streaming companies, such as Netflix, decide what content their users enjoy. We have built and deployed a machine learning model that identifies the relationship between features and user scores. Ultimately, it can predict how well-received new content will be, before companies spend on licences or original productions--reducing the risk of dud content.\n\n# Sources\nWe have extracted data from two sources:\n- the [kaggle dataset](https://www.kaggle.com/netflix-inc/netflix-prize-data)\n- [IMDb developer](https://developer.imdb.com/) using API requests\n\n# Machine Learning Model\nWe have created a pipeline which transforms raw data and fits multiple models using regression techniques. We tested both individual models (e.g. Linear Regression, Lasso, Ridge, KNN) and emsemble methods (e.g. Voting, Bagging, Stacking, Ada). Our model achieved the best result, measure by RMSE (0.3), through Gradient Boosting Regressor.\n\n------------------------------------\n\n# Startup the project\n\nThe initial setup.\n\nCreate virtualenv and install the project:\n```bash\nsudo apt-get install virtualenv python-pip python-dev\ndeactivate; virtualenv ~/venv ; source ~/venv/bin/activate ;\\\n    pip install pip -U; pip install -r requirements.txt\n```\n\nUnittest test:\n```bash\nmake clean install test\n```\n\nCheck for Netflix in gitlab.com/{group}.\nIf your project is not set please add it:\n\n- Create a new project on `gitlab.com/{group}/Netflix`\n- Then populate it:\n\n```bash\n##   e.g. if group is \"{group}\" and project_name is \"Netflix\"\ngit remote add origin git@github.com:{group}/Netflix.git\ngit push -u origin master\ngit push -u origin --tags\n```\n\nFunctionnal test with a script:\n\n```bash\ncd\nmkdir tmp\ncd tmp\nNetflix-run\n```\n\n# Install\n\nGo to `https://github.com/{group}/Netflix` to see the project, manage issues,\nsetup you ssh public key, ...\n\nCreate a python3 virtualenv and activate it:\n\n```bash\nsudo apt-get install virtualenv python-pip python-dev\ndeactivate; virtualenv -ppython3 ~/venv ; source ~/venv/bin/activate\n```\n\nClone the project and install it:\n\n```bash\ngit clone git@github.com:{group}/Netflix.git\ncd Netflix\npip install -r requirements.txt\nmake clean install test                # install and test\n```\nFunctional test with a script:\n\n```bash\ncd\nmkdir tmp\ncd tmp\nNetflix-run\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboemer00%2Fnetflix","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fboemer00%2Fnetflix","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fboemer00%2Fnetflix/lists"}