https://github.com/szapp/candyanalysis

Case study: Analyze the candy power ranking to identify and recommend popular candy characteristics
https://github.com/szapp/candyanalysis

data-analysis data-visualization feature-selection interaction-terms

Last synced: about 2 months ago
JSON representation

Case study: Analyze the candy power ranking to identify and recommend popular candy characteristics

Host: GitHub
URL: https://github.com/szapp/candyanalysis
Owner: szapp
License: mit
Created: 2024-07-19T12:59:50.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-07-22T06:08:56.000Z (almost 2 years ago)
Last Synced: 2025-03-14T21:39:46.044Z (over 1 year ago)
Topics: data-analysis, data-visualization, feature-selection, interaction-terms
Language: Jupyter Notebook
Homepage:
Size: 969 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Candy Analysis

Case study: Analyze the [candy power ranking](https://github.com/fivethirtyeight/data/tree/master/candy-power-ranking) to identify and recommend popular candy characteristics.

The dataset by FiveThirtyEight is distributed un the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/).

## Project

The production of a new candy is planned.
Among the project team there is no consensus about the characteristics of the candy.

Based on a dataset from market analysis, the task is to give a clear recommendation for what characteristics the new product should express.

## Deliverable

The results are compiled in a [presentation](Presentation.pdf) with a clear recommendation.
The presentation is in German, but the numbers speak for themselves.

## Libraries used

- Scipy
- Scikit-learn
- Seaborn

^{*See [requirements.txt](requirements.txt).*}

## Challenges

- Small dataset (86 rows/samples)
- Data is aggregated over brands (e.g. win percentage)
- Study design might not be fair (not blind)

## Approach

Treating the problem not as a regression but as a classification and using statistical analysis allows to identify features that are statistically dependent with popular brands.
With interaction terms, the combination of successful characteristics can be recommended.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/szapp/candyanalysis

Awesome Lists containing this project

README