Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/noopur-phadkar/neo4j-product-recommendation-system

This repository features a Real-Time Product Recommender System built using Neo4j, aimed at demonstrating the application of graph databases. It focuses on utilizing Neo4j for managing complex data relationships in recommendation engines, providing insights into effective data handling and analytics.
https://github.com/noopur-phadkar/neo4j-product-recommendation-system

cypher-query-language graphdatabase neo4j recommendation-system

Last synced: about 1 month ago
JSON representation

This repository features a Real-Time Product Recommender System built using Neo4j, aimed at demonstrating the application of graph databases. It focuses on utilizing Neo4j for managing complex data relationships in recommendation engines, providing insights into effective data handling and analytics.

Awesome Lists containing this project

README

        

# Real-Time Product Recommendation System

## Introduction
In the age of digital commerce, providing personalized shopping experiences has become pivotal for customer engagement and retention. This Product Recommendation System harnesses the power of graph databases and big data analytics. It's designed to deliver highly personalized product recommendations by analyzing intricate patterns in user purchase behaviors and preferences.

At the core of this system is the Neo4j graph database, chosen for its ability to model and traverse complex relationships between data entities. This database structure is especially adept at handling the interconnected nature of users, products, and categories - elements crucial to generating accurate and relevant recommendations.

## Features
- **Database Integration**: The system connects with the Neo4j database for storing and manipulating user, product, and category data.
- **Data Fetching from BigQuery**: Utilizes the 'theLook eCommerce' dataset from BigQuery Public Dataset for initial data.
- **Data Importing and Processing**: Efficiently imports and processes user, product, and category data into the Neo4j database.
- **Collaborative Filtering**: Implements a collaborative filtering algorithm to generate personalized product recommendations.

## System Components
- `fetch_from_bigquery.py`: Fetches orders, user details, product IDs, and product details from the 'theLook eCommerce' BigQuery Public Dataset.
- `database_setup.py`: Sets up the Neo4j database, establishes a connection, and creates necessary constraints.
- `set_up_database_using_bigquery.py`: Imports data from BigQuery and populates the Neo4j database.
- `insert_into_database.py`: Handles the insertion of data and creation of relationships in the Neo4j database.
- `collaborative_filtering.py`: Contains the logic for generating user-based and category-based product recommendations.
- `test_database.py` and `test_recommendations.py`: Contain unit tests to ensure the correctness of the database setup and the recommendation algorithm.

## Data Model
- **User**: Represents customers with attributes like ID, first name, last name.
- **Product**: Denotes products with details like ID, title, category.
- **Category**: Indicates product categories.
- **Relationships**: 'Purchased' between users and products, 'Belongs To' between products and categories.

## Recommendation Types
1. **User-Based Recommendations**: These recommendations are generated by analyzing purchase patterns of similar users. If a group of users have purchased similar items, the system suggests products purchased by these users that the target user has not yet purchased. This method leverages the 'wisdom of the crowd' approach and is effective in suggesting relevant products.

2. **Category-Based Recommendations**: This method focuses on the user's most frequently purchased product categories. The system identifies the category/categories that a particular user buys from most often and recommends new products from these categories. This approach assumes that users are likely to be interested in new products within their favorite categories.

## Algorithm Details
- The **User-Based Recommendation Algorithm** works by finding users with similar purchase histories to the target user. It identifies products purchased by these similar users that haven't been purchased by the target user yet. It ranks these products based on the frequency of their occurrence in the purchase history of similar users.

- The **Category-Based Recommendation Algorithm** first identifies the top categories in which the user has made purchases. It then finds products in these categories that the user hasn't purchased yet. The system recommends these products, prioritizing them based on their popularity within the category.

## Data Sources

- The [theLook eCommerce](https://console.cloud.google.com/bigquery?p=bigquery-public-data&d=thelook_ecommerce&page=dataset&project=recommendation-system-418420&ws=!1m4!1m3!3m2!1sbigquery-public-data!2sthelook_ecommerce) dataset is a BigQuery Public Dataset and is used as the source of real-time data.
- The data includes comprehensive user profiles, detailed product catalogs, and extensive order histories, providing a solid foundation for the recommendation algorithms.

### How the Data is Utilized:
- **Orders**: The system fetches order data to understand user purchase patterns. Each order includes a user ID and the products purchased, which aids in mapping the 'Purchased' relationships between users and products.
- **Users**: User details such as names and IDs are extracted. These details help in creating user nodes in the Neo4j database, establishing a foundation for personalized recommendation algorithms.
- **Products**: Product details, including names, categories, and IDs, are utilized to create product nodes. Understanding what products are available and their categories is crucial for both user-based and category-based recommendations.
- **Product-Category Relationships**: The dataset includes information about which category each product belongs to, allowing the system to create 'Belongs To' relationships between products and categories in the database.

## System Requirements
- Neo4j
- Google Cloud Platform (for BigQuery access)
- Python 3.x
- Required Python libraries: neo4j, google-cloud-bigquery

## Configuration
Before running the system, configure the following:
- Neo4j database URI, user, and password in `database_setup.py`.
- Google Cloud Project ID in `set_up_database_using_bigquery.py`.

## License
This project is licensed under the [MIT License](src/docs/LICENSE).