https://github.com/sabaudian/amd_market_basket_analysis
Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project
https://github.com/sabaudian/amd_market_basket_analysis
frequent-itemsets mapreduce market-basket-analysis massive-datasets pyspark python python-3 spark
Last synced: 3 months ago
JSON representation
Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project
- Host: GitHub
- URL: https://github.com/sabaudian/amd_market_basket_analysis
- Owner: Sabaudian
- License: mit
- Created: 2024-09-03T14:43:24.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-02-14T15:43:15.000Z (3 months ago)
- Last Synced: 2025-02-14T16:34:29.346Z (3 months ago)
- Topics: frequent-itemsets, mapreduce, market-basket-analysis, massive-datasets, pyspark, python, python-3, spark
- Language: Jupyter Notebook
- Homepage:
- Size: 2.16 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Algorithms for Massive Datasets - Project 2: Market-basket analysis
[](https://colab.research.google.com/github/Sabaudian/AMD_Market_Basket_Analysis/blob/main/AMD_project.ipynb)
## Summary
The task is to implement from scratch a system finding frequent itemsets (aka market-basket analysis), considering each movie as a basket and the actors as items.## Introduction
Market-basket analysis was originally employed by retailers to find out items relationship among the customers transactions, with the main goal of reveling products that are often brought together, optimizing product placement and proposing targeted offers to clients.
Today, this technique is employed in a variety of applications, such as performing fraud detection, understanding customer behavior under different conditions, and in healthcare, where it is used to identify the relationship between different diseases and symptoms. In general terms, it represents a many-to-many association between two kinds of entities.
This study focuses on finding frequent itemsets by working on a dataset that collects various information about movies, treating movies as baskets and actors as items. To achieve the intended goal, two algorithms were implemented from scratch: the A-priori algorithm and the algorithm of Park, Chen, and Yu (PCY).