{"id":48105798,"url":"https://github.com/mgorshkov/sklearn","last_synced_at":"2026-04-04T15:55:38.927Z","repository":{"id":157556569,"uuid":"546457483","full_name":"mgorshkov/sklearn","owner":"mgorshkov","description":"ML methods from scikit-learn library","archived":false,"fork":false,"pushed_at":"2026-03-26T09:43:14.000Z","size":88,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-27T03:36:38.666Z","etag":null,"topics":["cplusplus","cpp","machine-learning","machinelearning","mathematics","ml","sklearn"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mgorshkov.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-10-06T05:29:59.000Z","updated_at":"2025-03-14T06:53:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"88e7258f-4066-4f7c-becf-cacbb2bb4bc3","html_url":"https://github.com/mgorshkov/sklearn","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/mgorshkov/sklearn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgorshkov%2Fsklearn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgorshkov%2Fsklearn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgorshkov%2Fsklearn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgorshkov%2Fsklearn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mgorshkov","download_url":"https://codeload.github.com/mgorshkov/sklearn/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mgorshkov%2Fsklearn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31404692,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T10:20:44.708Z","status":"ssl_error","status_checked_at":"2026-04-04T10:20:06.846Z","response_time":60,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cplusplus","cpp","machine-learning","machinelearning","mathematics","ml","sklearn"],"created_at":"2026-04-04T15:55:38.307Z","updated_at":"2026-04-04T15:55:38.914Z","avatar_url":"https://github.com/mgorshkov.png","language":"C++","readme":"[![Build status](https://ci.appveyor.com/api/projects/status/2pl7od2nosslyqay/branch/main?svg=true)](https://ci.appveyor.com/project/mgorshkov/sklearn/branch/main)\n\n# About\nML Methods from scikit-learn library.\n\n# Description\nImplements some ML Methods from scikit-learn library.\n\n# Requirements\nAny C++20-compatible compiler:\n* gcc 10 or higher\n* clang 6 or higher\n* Visual Studio 2019 or higher\n\n# Repo\n```\ngit clone https://github.com/mgorshkov/sklearn.git\n```\n\n# Build unit tests and sample\n## Linux/MacOS\n```\nmkdir build \u0026\u0026 cd build\ncmake ..\ncmake --build .\n```\n## Windows\n```\nmkdir build \u0026\u0026 cd build\ncmake ..\ncmake --build . --config Release\n```\n\n# Build docs\n```\ncmake --build . --target doc\n```\n\nOpen sklearn/build/doc/html/index.html in your browser.\n\n# Install\n```\ncmake .. -DCMAKE_INSTALL_PREFIX:PATH=~/sklearn_install\ncmake --build . --target install\n```\n\n# Usage example (samples/neighbors/iris)\n```\n#include \u003ciostream\u003e\n\n#include \u003csklearn/datasets/datasets.hpp\u003e\n#include \u003csklearn/metrics/accuracy_score.hpp\u003e\n#include \u003csklearn/model_selection/train_test_split.hpp\u003e\n#include \u003csklearn/neighbors/KNeighborsClassifier.hpp\u003e\n#include \u003csklearn/preprocessing/StandardScaler.hpp\u003e\n\nint main(int, char **) {\n    using namespace sklearn::metrics;\n    using namespace sklearn::datasets;\n    using namespace sklearn::model_selection;\n    using namespace sklearn::neighbors;\n    using namespace sklearn::preprocessing;\n\n    auto iris = load_iris();\n    auto data = iris.data();\n    auto target = iris.target();\n\n    auto [X_train, X_test, y_train, y_test] =\n            train_test_split\u003cnp::float_, np::int_, 600, 150\u003e({.X = data, .y = target, .test_size = 0.2, .random_state = 42});\n    auto sc_X = StandardScaler{};\n    X_train = sc_X.fit_transform(X_train);\n    X_test = sc_X.transform(X_test);\n\n    auto kn = KNeighborsClassifier\u003cnp::float_, np::int_\u003e{{.n_neighbors = 13,\n                                                          .p = 2,\n                                                          .metric = sklearn::metrics::DistanceMetricType::kEuclidean}};\n    kn.fit(X_train, y_train);\n\n    auto y_pred = kn.predict(X_test);\n    std::cout \u003c\u003c \"Prediction: \" \u003c\u003c y_pred \u003c\u003c std::endl;\n    std::cout \u003c\u003c \"Target: \" \u003c\u003c y_test \u003c\u003c std::endl;\n\n    auto score = accuracy_score\u003cnp::int_\u003e(y_test, y_pred);\n    std::cout \u003c\u003c \"Score: \" \u003c\u003c score \u003c\u003c std::endl;\n\n    return 0;\n}\n```\n# How to build the sample\n\n1. Clone the repo\n```\ngit clone https://github.com/mgorshkov/sklearn.git\n```\n2. cd samples/neighbors/iris\n```\ncd samples/neighbors/iris\n```\n3. Make build dir\n```\nmkdir -p build-release \u0026\u0026 cd build-release\n```\n4. Configure cmake\n```\ncmake ..\n```\n5. Build\n## Linux/MacOS\n```\ncmake --build .\n```\n## Windows\n```\ncmake --build . --config Release\n```\n6. Run the app\n```\n$ ./neighbors_iris\nPrediction: [1 2 1 0 2 0 2 0 0 2 0 1 0 2 1 1 0 0 0 2 0 2 2 2 0 1 2 1 2 1]\nTarget: [1 2 1 0 2 0 2 0 0 2 0 1 0 2 1 1 0 0 0 2 0 2 2 2 0 1 2 1 2 1]\nScore: 1\n```\n\n# Usage example (samples/neighbors/diabetes)\n```\n#include \u003ciostream\u003e\n\n#include \u003cpd/core/frame/DataFrame/DataFrameStreamIo.hpp\u003e\n#include \u003cpd/read_csv.hpp\u003e\n\n#include \u003csklearn/metrics/accuracy_score.hpp\u003e\n#include \u003csklearn/metrics/confusion_matrix.hpp\u003e\n#include \u003csklearn/metrics/f1_score.hpp\u003e\n#include \u003csklearn/model_selection/train_test_split.hpp\u003e\n#include \u003csklearn/neighbors/KNeighborsClassifier.hpp\u003e\n#include \u003csklearn/preprocessing/StandardScaler.hpp\u003e\n\nint main(int, char **) {\n    using namespace pd;\n    using namespace sklearn::model_selection;\n    using namespace sklearn::neighbors;\n    using namespace sklearn::preprocessing;\n    using namespace sklearn::metrics;\n\n    auto data = read_csv(\"https://raw.githubusercontent.com/adityakumar529/Coursera_Capstone/master/diabetes.csv\");\n    const char *non_zero[] = {\"Glucose\", \"BloodPressure\", \"SkinThickness\", \"Insulin\", \"BMI\"};\n    for (const auto \u0026column: non_zero) {\n        data[column] = data[column].replace(0L, np::NaN);\n        auto mean = data[column].mean(true);\n        data[column] = data[column].replace(np::NaN, mean);\n    }\n\n    auto X = data.iloc(\":\", \"0:8\");\n    auto y = data.iloc(\":\", \"8\");\n    auto [X_train, X_test, y_train, y_test] = train_test_split({.X = X, .y = y, .test_size = 0.2, .random_state = 42});\n\n    auto sc_X = StandardScaler{};\n    X_train = sc_X.fit_transform(X_train);\n    X_test = sc_X.transform(X_test);\n\n    auto classifier = KNeighborsClassifier\u003cpd::DataFrame\u003e{{.n_neighbors = 13,\n                                                           .p = 2,\n                                                           .metric = sklearn::metrics::DistanceMetricType::kEuclidean}};\n    classifier.fit(X_train, y_train);\n    auto y_pred = classifier.predict(X_test);\n    std::cout \u003c\u003c \"Prediction: \" \u003c\u003c y_pred \u003c\u003c std::endl;\n    auto cm = confusion_matrix({.y_true = y_test, .y_pred = y_pred});\n    std::cout \u003c\u003c cm \u003c\u003c std::endl;\n    std::cout \u003c\u003c f1_score({.y_true = y_test, .y_pred = y_pred}) \u003c\u003c std::endl;\n    std::cout \u003c\u003c accuracy_score(y_test, y_pred) \u003c\u003c std::endl;\n\n    return 0;\n}\n```\n\n# How to build the sample\n\n1. Clone the repo\n```\ngit clone https://github.com/mgorshkov/sklearn.git\n```\n2. cd samples/neighbors\n```\ncd samples/neighbors/iris\n```\n3. Make build dir\n```\nmkdir -p build-release \u0026\u0026 cd build-release\n```\n4. Configure cmake\n```\ncmake ..\n```\n5. Build\n## Linux/MacOS\n```\ncmake --build .\n```\n## Windows\n```\ncmake --build . --config Release\n```\n6. Run the app\n```\n$ ./neighbors_diabetes\nPrediction: \t0\n0\t0\n1\t0\n2\t0\n3\t0\n4\t1\n...\n149\t1\n150\t0\n151\t0\n152\t0\n153\t1\n154 rows x 1 columns\n\n[[85 15]\n [19 35]]\n0.673077\n0.779221\n```\n\n# Links\n* C++ numpy-like template-based array implementation: https://github.com/mgorshkov/np\n* Methods from pandas library on top of NP library: https://github.com/mgorshkov/pd\n* Scientific methods on top of NP library: https://github.com/mgorshkov/scipy\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgorshkov%2Fsklearn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmgorshkov%2Fsklearn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmgorshkov%2Fsklearn/lists"}