https://github.com/alsult/wine_classification
This is a wine classification project based on 13 numerical features of wines grown in the same region in Italy but derived from three different cultivars.
https://github.com/alsult/wine_classification
logistic-regression machine-learning matplotlib multiclass-classification pandas python scikit-learn seaborn
Last synced: 9 months ago
JSON representation
This is a wine classification project based on 13 numerical features of wines grown in the same region in Italy but derived from three different cultivars.
- Host: GitHub
- URL: https://github.com/alsult/wine_classification
- Owner: AlSult
- License: apache-2.0
- Created: 2025-04-22T13:05:49.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-04-22T20:06:44.000Z (9 months ago)
- Last Synced: 2025-04-23T15:16:32.967Z (9 months ago)
- Topics: logistic-regression, machine-learning, matplotlib, multiclass-classification, pandas, python, scikit-learn, seaborn
- Language: Jupyter Notebook
- Homepage: https://archive.ics.uci.edu/ml/datasets/Wine
- Size: 460 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Wine Dataset Visualization & Classification
This project explores the classic [Wine dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html) from scikit-learn.
The goal is to visualize the data, understand feature relationships, and build a simple classification model to predict wine types based on chemical attributes.
---
## Dataset Overview
- **Number of Instances:** 178
- **Number of Features:** 13 numeric attributes + 1 target label
- **Classes:**
- `class_0`
- `class_1`
- `class_2`
### Dataset Description
The dataset contains the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. Each wine is described by 13 numerical features (e.g., alcohol content, flavanoids, color intensity) and a class label indicating its cultivar.
A quick summary of the dataset:
- All features are continuous and numeric
- No missing values
- Class distribution is roughly balanced among the 3 types
---
## Features
- Alcohol
- Malic acid
- Ash
- Alcalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline
---
## Project Goals
- Perform exploratory data analysis (EDA)
- Visualize relationships between features and wine classes
- Train a classification model
- Evaluate model performance using precision, recall, F1 score, and confusion matrix
---
## Visualizations
### Scatter Plots (Seaborn)
- `Flavanoids vs Color Intensity`
- `Flavanoids vs Proline`
- `Alcohol vs Color Intensity`
All colored by wine class (`target_name`), using `sns.scatterplot()`.
### Seaborn Pairplot
- Visualizes multiple pairwise relationships at once (e.g., Alcohol, Flavanoids, Color Intensity, Proline)
- Helps reveal feature correlations and class separation in a grid-style view
---
## Model Training
- **Model Used:** Logistic Regression/ Multiclass Classification
- **Accuracy:** 94.44% (`model.score = 0.9444`)
- **Evaluation Metrics:**
- **Classification Report:** Precision, Recall, F1-score
- **Confusion Matrix:** Visualized with `seaborn.heatmap`
---
## How to Run
1. Clone the repository or download the notebook
2. Install dependencies:
```bash
pip install matplotlib seaborn scikit-learn pandas
```
3. Run the notebook:
```bash
jupyter notebook
```
4. Open `wine_classification.ipynb` and execute cells
---
## Files
- `wine_classification.ipynb` — Notebook with EDA, model training, and evaluation
- `README.md` — Project overview and setup instructions
---
## Credits
- Dataset: [UCI ML Wine Dataset](https://archive.ics.uci.edu/ml/datasets/Wine)
- Tools: Python, pandas, matplotlib, seaborn, scikit-learn