https://github.com/ayush272002/cpp-ml-toolkit
Collection of machine learning algorithms implemented in C++ with Python bindings via pybind11.
https://github.com/ayush272002/cpp-ml-toolkit
cpp eigen pybind11 spdlog
Last synced: 4 months ago
JSON representation
Collection of machine learning algorithms implemented in C++ with Python bindings via pybind11.
- Host: GitHub
- URL: https://github.com/ayush272002/cpp-ml-toolkit
- Owner: Ayush272002
- License: mit
- Created: 2025-08-17T10:53:34.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-08-24T18:56:35.000Z (5 months ago)
- Last Synced: 2025-08-24T22:54:04.735Z (5 months ago)
- Topics: cpp, eigen, pybind11, spdlog
- Language: C++
- Homepage:
- Size: 64.7 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# CPP-ML-Toolkit
This project is a collection of machine learning algorithms implemented in C++ with Python bindings via pybind11. It is designed for both educational and practical use, allowing you to experiment with and extend classic ML models efficiently.
## C++ Libraries Used
- **Eigen** — for linear algebra and matrix operations
- **spdlog** — for fast, modern logging and debugging
## Features
- **High-performance C++ implementations** with optimized matrix operations
- **Python bindings** for easy experimentation and visualization
- **Comprehensive logging** with debug and info levels for traceability
- **Visual comparisons** with scikit-learn implementations
- **Modern CMake build system** with automatic dependency management (Eigen, spdlog, pybind11)
- **Organized test structure** with performance metrics and plots
## Currently Implemented Algorithms
- **Linear Regression** — Full-batch and mini-batch gradient descent with regularization
- **Logistic Regression** — Binary classification with L1/L2 regularization support
- **K-Nearest Neighbors (KNN)** — Supports both classification and regression, multiple distance metrics (Euclidean, Manhattan, Minkowski)
## Directory Structure
```
cpp-ml-toolkit/
├── src/ # C++ source code
├── include/ # C++ headers
├── test/ # Python test and demo
├── images/ # Output images (plots etc.)
├── data/ # Datasets
├── CMakeLists.txt # Build configuration
├── .gitignore
└── README.md
```
## Performance Comparisons
The test scripts generate detailed performance comparisons with scikit-learn:
### Linear Regression
- **Dataset**: Boston Housing (normalized features)
- **Metrics**: Mean Squared Error (MSE)
- **Visualizations**: Loss curve during training, Predictions vs Actual scatter plot
### Logistic Regression
- **Dataset**: Breast Cancer (highly imbalanced)
- **Metrics**: Accuracy and AUC (Area Under ROC Curve)
- **Visualizations**: Loss curve during training, Predicted probabilities comparison
### K-Nearest Neighbors (KNN)
- **Datasets**: Wine Quality
- **Metrics**: Accuracy, Mean Squared Error (MSE)
- **Visualizations**: Accuracy vs. k plot, Predictions vs Actual scatter plot
## Getting Started
1. **Clone the repository**:
```bash
git clone https://github.com/Ayush272002/CPP-ML-Toolkit.git
```
2. **Install Python dependencies** (in your venv):
```bash
pip install -r requirements.txt
```
3. **Install system dependencies** (for Ubuntu/Debian):
```bash
sudo apt-get install libeigen3-dev python3-dev
```
4. **Build the project:**
```bash
mkdir build && cd build
cmake ..
make
```
5. **Run Python demos/tests:**
```bash
python3 test/LinearRegression.py
```
The tests will generate performance metrics and save comparison plots in the `images/` directory.
## Contributing
Contributions for new algorithms, bug fixes, and improvements are welcome! Please open an issue or pull request.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.