https://github.com/szapp/candyanalysis
Case study: Analyze the candy power ranking to identify and recommend popular candy characteristics
https://github.com/szapp/candyanalysis
data-analysis data-visualization feature-selection interaction-terms
Last synced: about 2 months ago
JSON representation
Case study: Analyze the candy power ranking to identify and recommend popular candy characteristics
- Host: GitHub
- URL: https://github.com/szapp/candyanalysis
- Owner: szapp
- License: mit
- Created: 2024-07-19T12:59:50.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-22T06:08:56.000Z (almost 2 years ago)
- Last Synced: 2025-03-14T21:39:46.044Z (over 1 year ago)
- Topics: data-analysis, data-visualization, feature-selection, interaction-terms
- Language: Jupyter Notebook
- Homepage:
- Size: 969 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Candy Analysis
Case study: Analyze the [candy power ranking](https://github.com/fivethirtyeight/data/tree/master/candy-power-ranking) to identify and recommend popular candy characteristics.
The dataset by FiveThirtyEight is distributed un the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/).
## Project
The production of a new candy is planned.
Among the project team there is no consensus about the characteristics of the candy.
Based on a dataset from market analysis, the task is to give a clear recommendation for what characteristics the new product should express.
## Deliverable
The results are compiled in a [presentation](Presentation.pdf) with a clear recommendation.
The presentation is in German, but the numbers speak for themselves.
## Libraries used
- Scipy
- Scikit-learn
- Seaborn
*See [requirements.txt](requirements.txt).*
## Challenges
- Small dataset (86 rows/samples)
- Data is aggregated over brands (e.g. win percentage)
- Study design might not be fair (not blind)
## Approach
Treating the problem not as a regression but as a classification and using statistical analysis allows to identify features that are statistically dependent with popular brands.
With interaction terms, the combination of successful characteristics can be recommended.