An open API service indexing awesome lists of open source software.

https://github.com/sabaudian/amd_market_basket_analysis

Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project
https://github.com/sabaudian/amd_market_basket_analysis

frequent-itemsets mapreduce market-basket-analysis massive-datasets pyspark python python-3 spark

Last synced: 3 months ago
JSON representation

Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project

Awesome Lists containing this project

README

        

# Algorithms for Massive Datasets - Project 2: Market-basket analysis

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Sabaudian/AMD_Market_Basket_Analysis/blob/main/AMD_project.ipynb)

## Summary
The task is to implement from scratch a system finding frequent itemsets (aka market-basket analysis), considering each movie as a basket and the actors as items.

## Introduction

Market-basket analysis was originally employed by retailers to find out items relationship among the customers transactions, with the main goal of reveling products that are often brought together, optimizing product placement and proposing targeted offers to clients.
Today, this technique is employed in a variety of applications, such as performing fraud detection, understanding customer behavior under different conditions, and in healthcare, where it is used to identify the relationship between different diseases and symptoms. In general terms, it represents a many-to-many association between two kinds of entities.
This study focuses on finding frequent itemsets by working on a dataset that collects various information about movies, treating movies as baskets and actors as items. To achieve the intended goal, two algorithms were implemented from scratch: the A-priori algorithm and the algorithm of Park, Chen, and Yu (PCY).