https://github.com/coby-sonn/kmeans-python-c
An efficient K-Means clustering implementation combining Python for preprocessing and a C extension for optimized computations, featuring K-Means++ initialization and linked-list memory management.
https://github.com/coby-sonn/kmeans-python-c
kmeans machine-learning memory-management python-c-extension
Last synced: 6 months ago
JSON representation
An efficient K-Means clustering implementation combining Python for preprocessing and a C extension for optimized computations, featuring K-Means++ initialization and linked-list memory management.
- Host: GitHub
- URL: https://github.com/coby-sonn/kmeans-python-c
- Owner: Coby-Sonn
- License: mit
- Created: 2025-02-18T09:35:20.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-02-18T09:41:15.000Z (11 months ago)
- Last Synced: 2025-02-18T10:37:38.979Z (11 months ago)
- Topics: kmeans, machine-learning, memory-management, python-c-extension
- Language: C
- Homepage:
- Size: 11.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# K-Means Clustering Implementation in Python and C
This repository contains an implementation of the K-Means clustering algorithm, leveraging both Python and C for optimized performance. This was built during university studies in a C & Python data analysis course.
## Overview
- **Python Implementation**: Handles data processing and initialization using K-Means++.
- **C Extension**: Optimized clustering computation using linked lists for efficient memory management.
## Files
- `kmeans_pp.py` - Python implementation, including K-Means++ initialization.
- `kmeansmodule.c` - C extension implementing the core clustering logic.
- `setup.py` - Build script for compiling the C extension into a Python module.
## Installation
### Prerequisites
- Python 3.x installed.
- A C compiler such as `gcc`.
### Building the C Extension
Run the following command to compile the C module:
```sh
python setup.py build_ext --inplace
```
This will generate a shared library (`mykmeanssp.*.so` or `.pyd` on Windows) that can be imported into Python.
## How to Run
### Running the Python Implementation
#### Command Syntax:
```sh
python kmeans_pp.py []
```
- ``: Number of clusters (integer > 1 and < N, where N is the number of points).
- ``: (Optional) Maximum number of iterations (default: 300, max: 1000).
- ``: Convergence threshold (float >= 0).
- ``, ``: CSV files containing the input data.
#### Example:
```sh
python kmeans_pp.py 3 300 0.001 data1.csv data2.csv
```
## Output Format
- The program prints the initial centroids' indices.
- The final cluster centroids are printed, with each centroid on a new line formatted to 4 decimal places.
## Notes
- Input files must be in CSV format with numerical values.
- The implementation uses the K-Means++ initialization method for better convergence.
- The Python script interfaces with the optimized C extension for better performance.
## License
This project is released under the MIT License.