https://github.com/sabaudian/amd_market_basket_analysis

Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project
https://github.com/sabaudian/amd_market_basket_analysis

frequent-itemsets mapreduce market-basket-analysis massive-datasets pyspark python python-3 spark

Last synced: 5 months ago
JSON representation

Algorithms for Massive Datasets (AMD) -- Market-baskets analysis project

Host: GitHub
URL: https://github.com/sabaudian/amd_market_basket_analysis
Owner: Sabaudian
License: mit
Created: 2024-09-03T14:43:24.000Z (11 months ago)
Default Branch: main
Last Pushed: 2025-02-14T15:43:15.000Z (5 months ago)
Last Synced: 2025-02-14T16:34:29.346Z (5 months ago)
Topics: frequent-itemsets, mapreduce, market-basket-analysis, massive-datasets, pyspark, python, python-3, spark
Language: Jupyter Notebook
Homepage:
Size: 2.16 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Algorithms for Massive Datasets - Project 2: Market-basket analysis

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Sabaudian/AMD_Market_Basket_Analysis/blob/main/AMD_project.ipynb)

## Summary
The task is to implement from scratch a system finding frequent itemsets (aka market-basket analysis), considering each movie as a basket and the actors as items.

## Introduction

Market-basket analysis was originally employed by retailers to find out items relationship among the customers transactions, with the main goal of reveling products that are often brought together, optimizing product placement and proposing targeted offers to clients.
Today, this technique is employed in a variety of applications, such as performing fraud detection, understanding customer behavior under different conditions, and in healthcare, where it is used to identify the relationship between different diseases and symptoms. In general terms, it represents a many-to-many association between two kinds of entities.
This study focuses on finding frequent itemsets by working on a dataset that collects various information about movies, treating movies as baskets and actors as items. To achieve the intended goal, two algorithms were implemented from scratch: the A-priori algorithm and the algorithm of Park, Chen, and Yu (PCY).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sabaudian/amd_market_basket_analysis

Awesome Lists containing this project

README