https://github.com/rohit1901/py-cluster
Classifier and Cluster Analysis in Data Science
https://github.com/rohit1901/py-cluster
classification clustering data-science k-means-clustering machine-learning pytest python python3 ruff scikit-learn
Last synced: about 2 months ago
JSON representation
Classifier and Cluster Analysis in Data Science
- Host: GitHub
- URL: https://github.com/rohit1901/py-cluster
- Owner: rohit1901
- License: mit
- Created: 2023-09-15T13:21:58.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-09-17T11:56:37.000Z (almost 3 years ago)
- Last Synced: 2025-01-17T12:55:26.884Z (over 1 year ago)
- Topics: classification, clustering, data-science, k-means-clustering, machine-learning, pytest, python, python3, ruff, scikit-learn
- Language: Python
- Homepage:
- Size: 123 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# K-Means Clustering Application

[](https://opensource.org/licenses/MIT)

This Python application demonstrates K-Means clustering on various datasets and provides a modularized structure for loading data and performing clustering. The code is organized into two modules: `data_loader` and `clustering`.
## Getting Started
These instructions will help you set up and run the project on your local machine.
### Prerequisites
- Python 3.x
- NumPy
- scikit-learn
- seaborn
- matplotlib
- ruff
- pytest
You can install the required dependencies using pip:
```
pip install numpy scikit-learn seaborn matplotlib ruff pytest
```
### Installation
1. Clone the repository:
```
git clone https://github.com/rohit1901/py-cluster.git
cd py-cluster
```
2. Run the main script:
```
python main_1.py
python main_2.py
```
## Code Structure
- `data_utils` module: Responsible for loading data and extracting dimensions and samples.
- `clustering` module: Implements K-Means clustering and related functions.
- `classify_unknown_samples` module: Implements a function to classify unknown samples using a trained model.
- `main_1` script: Demonstrates classification of unknown samples using nearest neighbour classification.
- `main_2` script: Demonstrates K-Means clustering on various datasets.
## Testing
To run unit tests for the application, use the following commands:
```
pytest
```
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- This project was inspired by the need to understand K-Means clustering and its implementation in Python.
- Thanks to the contributors and open-source libraries that made this project possible.