Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/snowch/movie-recommender-demo

This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with. The demo also uses IBM Message Hub (kafka) to push application events to topic where they are consumed by a spark streaming job running on IBM BigInsights (hadoop).
https://github.com/snowch/movie-recommender-demo

alternating-least-squares biginsights bluemix bokeh cloudant collaborative-filtering dsx hadoop hive ibm-biginsights ibm-bluemix jupyter-notebook kafka machine-learning messagehub notebook python-flask-application redis spark spark-streaming

Last synced: about 2 months ago
JSON representation

This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with. The demo also uses IBM Message Hub (kafka) to push application events to topic where they are consumed by a spark streaming job running on IBM BigInsights (hadoop).

Awesome Lists containing this project

README

        

## Overview

This project walks through how you can create recommendations using Apache Spark machine learning. There are a number of jupyter notebooks that you can run on IBM Data Science Experience, and there a live demo of a movie recommendation web application you can interact with. The demo also uses IBM Message Hub (kafka) to push application events to topic where they are consumed by a spark streaming job running on IBM BigInsights (hadoop).

## Quick start

There is an overview video on [YouTube](https://www.youtube.com/watch?v=is9ZzgbGSdM).

This project is a demo movie recommender application. This demo has been installed with approximately four thousand movies and 500,000 ratings. The ratings have been generated randomly. The purpose of this web application is to allow users to search for movies, rate movies, and receive recommendations for movies based on their ratings.

## Notebooks

Start with [Introduction](./notebooks/Introduction.ipynb) to read more about this project.

You can import these notebooks into IBM Data Science Experience. I have occasionally experienced issues when trying to load from a URL. If that happens to you, try cloning or downloading this repo and importing the notebooks as files.

## Technologies

The overall architecture looks like this:


Overall Architecture

The technologies used in this demo are:

**Core components (Web Application)**

- Python flask application
- IBM Bluemix for hosting the web application and services
- IBM Cloudant NoSQL for storing movies, ratings, user accounts and recommendations
- IBM Datascience Experience (DSX) and Spark as a Service

**Optional components (Hadoop Warehouse)**

The core demo can run without these components.

- IBM Compose Redis for maintaining an Atomic Increment counter for ID fields for user accounts. Use this if you want integer user account ids rather than the guuids generated by Cloudant.
- IBM Message Hub for the web application to send a stream of ratings as they are entered by the user.
- IBM BigInsights on Cloud using spark streaming to ingest data from MessageHub and expose via Hive.

## Setting up your own demo web application instance on Bluemix

### Quick deploy

Click on the link below, then follow the instructions. Note that this step may take quite a long time (maybe 30 minutes).

[![Deploy to Bluemix](https://bluemix.net/deploy/button.png)](https://bluemix.net/deploy?repository=https://github.com/snowch/movie-recommender-demo.git)

- **CAUTION:** a python flask application instance with 128MB memory and an instance of Cloudant 'Lite' will get deployed - you may get charged for these services. Please check charges before deploying. Note that Redis, Message Hub and BigInsights do not get deployed by default. If you wish to deploy the solution these optional components, follow the instructions [here](./MANUAL_INSTALLATION.md)

After deploying to Bluemix, you will need to create a new [DSX](http://datascience.ibm.com) project and import the notebooks. The notebook [Step 07](./notebooks/Step%2007%20-%20Cloudant%20Datastore%20Recommender.ipynb) is responsible for creating recommendations and saving them to Cloudant. You will not get recommendations until you have setup this notebook with your Cloudant credentials and run the notebook from DSX.

## Web application screenshots

### Rating a movie

The screenshot below shows some movies being rated by a user.

![Screenshot of rating a movie](./docs/screenshot_ratings.png)

### Movie recommendations

The screenshot below shows movie recommendations provided by Spark machine learning.

![Screenshot of movie recommendations](./docs/screenshot_recommendations.png)