https://github.com/cyblx/clustering

This project explores clustering techniques and supervised learning applied to World Cup team performance analysis. The methodologies include K-Means, DBSCAN, K-Nearest Neighbors, Gaussian Mixture Models (GMM), and Agglomerative Clustering.
https://github.com/cyblx/clustering

clustering data-analysis dbscan gmm kmeans supervised-learning unsupervised-learning world-cup

Last synced: 12 months ago
JSON representation

Host: GitHub
URL: https://github.com/cyblx/clustering
Owner: CybLX
License: mit
Created: 2024-10-18T17:34:09.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-18T18:02:33.000Z (over 1 year ago)
Last Synced: 2025-05-28T07:58:06.300Z (about 1 year ago)
Topics: clustering, data-analysis, dbscan, gmm, kmeans, supervised-learning, unsupervised-learning, world-cup
Language: Jupyter Notebook
Homepage:
Size: 1.6 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Clustering Techniques and Supervised Learning

## Overview
This project explores various clustering techniques and supervised learning applied to the analysis of team performance in the World Cup. The methodologies covered include K-Means, DBSCAN, K-Nearest Neighbors, Gaussian Mixture Models (GMM), and Agglomerative Clustering.

## Dataset Features
The dataset used in this project contains information such as:

- **Position**: Team's ranking position
- **Team**: Name of the team
- **Games Played**: Total number of games played
- **Win**: Total number of wins
- **Draw**: Total number of draws
- **Loss**: Total number of losses
- **Goals For**: Total goals scored by the team
- **Goals Against**: Total goals conceded by the team
- **Goal Difference**: Difference between goals scored and conceded
- **Points**: Total points accumulated
- **Year**: Year of the competition

## Project Goals
The main objective of this project is to apply clustering techniques to gain a better understanding of the data structure and the relationships among the variables. We aim to identify groups of similar teams, effectively segment the data, and evaluate the performance of machine learning algorithms in different scenarios, with an emphasis on teaching unsupervised learning techniques.

## Tools Used
- Python
- Jupyter Notebook
- Libraries: Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, among others.

## How to Use

1. Clone the repository to your local machine:
```bash
git clone https://github.com/cyblx/clustering.git
```

2. Install the required libraries:
```bash
pip install -r requirements.txt
```

3. Open Jupyter Notebook and run the analysis:
```bash
jupyter notebook
```

4. Follow the instructions within the notebook to explore the dataset and view the analysis results.

## For More Information
For more information, codes, tutorials, and exciting projects, visit the links below:

- Email: alves_lucasoliveira@usp.br
- GitHub: [cyblx](https://github.com/cyblx)
- LinkedIn: [Cyblx](https://www.linkedin.com/in/cyblx)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/cyblx/clustering

Awesome Lists containing this project

README