https://github.com/mgorshkov/sklearn
ML methods from scikit-learn library
https://github.com/mgorshkov/sklearn
cplusplus cpp machine-learning machinelearning mathematics ml sklearn
Last synced: 1 day ago
JSON representation
ML methods from scikit-learn library
- Host: GitHub
- URL: https://github.com/mgorshkov/sklearn
- Owner: mgorshkov
- License: mit
- Created: 2022-10-06T05:29:59.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2026-03-26T09:43:14.000Z (11 days ago)
- Last Synced: 2026-03-27T03:36:38.666Z (10 days ago)
- Topics: cplusplus, cpp, machine-learning, machinelearning, mathematics, ml, sklearn
- Language: C++
- Homepage:
- Size: 85.9 KB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://ci.appveyor.com/project/mgorshkov/sklearn/branch/main)
# About
ML Methods from scikit-learn library.
# Description
Implements some ML Methods from scikit-learn library.
# Requirements
Any C++20-compatible compiler:
* gcc 10 or higher
* clang 6 or higher
* Visual Studio 2019 or higher
# Repo
```
git clone https://github.com/mgorshkov/sklearn.git
```
# Build unit tests and sample
## Linux/MacOS
```
mkdir build && cd build
cmake ..
cmake --build .
```
## Windows
```
mkdir build && cd build
cmake ..
cmake --build . --config Release
```
# Build docs
```
cmake --build . --target doc
```
Open sklearn/build/doc/html/index.html in your browser.
# Install
```
cmake .. -DCMAKE_INSTALL_PREFIX:PATH=~/sklearn_install
cmake --build . --target install
```
# Usage example (samples/neighbors/iris)
```
#include
#include
#include
#include
#include
#include
int main(int, char **) {
using namespace sklearn::metrics;
using namespace sklearn::datasets;
using namespace sklearn::model_selection;
using namespace sklearn::neighbors;
using namespace sklearn::preprocessing;
auto iris = load_iris();
auto data = iris.data();
auto target = iris.target();
auto [X_train, X_test, y_train, y_test] =
train_test_split({.X = data, .y = target, .test_size = 0.2, .random_state = 42});
auto sc_X = StandardScaler{};
X_train = sc_X.fit_transform(X_train);
X_test = sc_X.transform(X_test);
auto kn = KNeighborsClassifier{{.n_neighbors = 13,
.p = 2,
.metric = sklearn::metrics::DistanceMetricType::kEuclidean}};
kn.fit(X_train, y_train);
auto y_pred = kn.predict(X_test);
std::cout << "Prediction: " << y_pred << std::endl;
std::cout << "Target: " << y_test << std::endl;
auto score = accuracy_score(y_test, y_pred);
std::cout << "Score: " << score << std::endl;
return 0;
}
```
# How to build the sample
1. Clone the repo
```
git clone https://github.com/mgorshkov/sklearn.git
```
2. cd samples/neighbors/iris
```
cd samples/neighbors/iris
```
3. Make build dir
```
mkdir -p build-release && cd build-release
```
4. Configure cmake
```
cmake ..
```
5. Build
## Linux/MacOS
```
cmake --build .
```
## Windows
```
cmake --build . --config Release
```
6. Run the app
```
$ ./neighbors_iris
Prediction: [1 2 1 0 2 0 2 0 0 2 0 1 0 2 1 1 0 0 0 2 0 2 2 2 0 1 2 1 2 1]
Target: [1 2 1 0 2 0 2 0 0 2 0 1 0 2 1 1 0 0 0 2 0 2 2 2 0 1 2 1 2 1]
Score: 1
```
# Usage example (samples/neighbors/diabetes)
```
#include
#include
#include
#include
#include
#include
#include
#include
#include
int main(int, char **) {
using namespace pd;
using namespace sklearn::model_selection;
using namespace sklearn::neighbors;
using namespace sklearn::preprocessing;
using namespace sklearn::metrics;
auto data = read_csv("https://raw.githubusercontent.com/adityakumar529/Coursera_Capstone/master/diabetes.csv");
const char *non_zero[] = {"Glucose", "BloodPressure", "SkinThickness", "Insulin", "BMI"};
for (const auto &column: non_zero) {
data[column] = data[column].replace(0L, np::NaN);
auto mean = data[column].mean(true);
data[column] = data[column].replace(np::NaN, mean);
}
auto X = data.iloc(":", "0:8");
auto y = data.iloc(":", "8");
auto [X_train, X_test, y_train, y_test] = train_test_split({.X = X, .y = y, .test_size = 0.2, .random_state = 42});
auto sc_X = StandardScaler{};
X_train = sc_X.fit_transform(X_train);
X_test = sc_X.transform(X_test);
auto classifier = KNeighborsClassifier{{.n_neighbors = 13,
.p = 2,
.metric = sklearn::metrics::DistanceMetricType::kEuclidean}};
classifier.fit(X_train, y_train);
auto y_pred = classifier.predict(X_test);
std::cout << "Prediction: " << y_pred << std::endl;
auto cm = confusion_matrix({.y_true = y_test, .y_pred = y_pred});
std::cout << cm << std::endl;
std::cout << f1_score({.y_true = y_test, .y_pred = y_pred}) << std::endl;
std::cout << accuracy_score(y_test, y_pred) << std::endl;
return 0;
}
```
# How to build the sample
1. Clone the repo
```
git clone https://github.com/mgorshkov/sklearn.git
```
2. cd samples/neighbors
```
cd samples/neighbors/iris
```
3. Make build dir
```
mkdir -p build-release && cd build-release
```
4. Configure cmake
```
cmake ..
```
5. Build
## Linux/MacOS
```
cmake --build .
```
## Windows
```
cmake --build . --config Release
```
6. Run the app
```
$ ./neighbors_diabetes
Prediction: 0
0 0
1 0
2 0
3 0
4 1
...
149 1
150 0
151 0
152 0
153 1
154 rows x 1 columns
[[85 15]
[19 35]]
0.673077
0.779221
```
# Links
* C++ numpy-like template-based array implementation: https://github.com/mgorshkov/np
* Methods from pandas library on top of NP library: https://github.com/mgorshkov/pd
* Scientific methods on top of NP library: https://github.com/mgorshkov/scipy