{"id":16966538,"url":"https://github.com/visweswaran1998/sklearn","last_synced_at":"2025-10-31T10:49:25.958Z","repository":{"id":130198055,"uuid":"204445510","full_name":"VISWESWARAN1998/sklearn","owner":"VISWESWARAN1998","description":"Trying to implement Scikit Learn for Python in C++ (Single Headers and No dependencies)","archived":false,"fork":false,"pushed_at":"2020-06-09T13:23:13.000Z","size":348,"stargazers_count":48,"open_issues_count":1,"forks_count":19,"subscribers_count":10,"default_branch":"master","last_synced_at":"2025-10-14T05:17:01.095Z","etag":null,"topics":["machine-learning"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VISWESWARAN1998.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":null,"patreon":null,"open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":"https://www.paypal.me/VisweswaranN"}},"created_at":"2019-08-26T09:47:44.000Z","updated_at":"2024-12-19T08:59:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"465954aa-1b1a-40ae-ba8c-1bc01a824e2c","html_url":"https://github.com/VISWESWARAN1998/sklearn","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/VISWESWARAN1998/sklearn","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VISWESWARAN1998%2Fsklearn","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VISWESWARAN1998%2Fsklearn/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VISWESWARAN1998%2Fsklearn/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VISWESWARAN1998%2Fsklearn/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VISWESWARAN1998","download_url":"https://codeload.github.com/VISWESWARAN1998/sklearn/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VISWESWARAN1998%2Fsklearn/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281976632,"owners_count":26592972,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-31T02:00:07.401Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning"],"created_at":"2024-10-14T00:06:08.100Z","updated_at":"2025-10-31T10:49:25.897Z","avatar_url":"https://github.com/VISWESWARAN1998.png","language":"C++","funding_links":["https://www.paypal.me/VisweswaranN"],"categories":[],"sub_categories":[],"readme":"# sklearn\nTrying to implement Scikit Learn for Python in C++\n\n#### PREPROCESSING:\n1. [Standardization](https://github.com/VISWESWARAN1998/sklearn#standardization)\n2. [Normalization](https://github.com/VISWESWARAN1998/sklearn#normalization)\n3. [Label Encoding](https://github.com/VISWESWARAN1998/sklearn#label-encoding)\n4. [Label Binarization](https://github.com/VISWESWARAN1998/sklearn#label-binarization)\n\n\n#### REGRESSION:\n1. [Least Squares Regression](https://github.com/VISWESWARAN1998/sklearn#least-squares-regressionsimple-linear-regression)\n2. [Multiple Linear Regression](https://github.com/VISWESWARAN1998/sklearn#multiple-linear-regression)\n\n#### CLASSIFIFCATION:\n1. [Gaussian Naive Bayes](https://github.com/VISWESWARAN1998/sklearn#classification---gaussian-naive-bayes)\n2. [Logistic Regression](https://github.com/VISWESWARAN1998/sklearn#logistic-regression)\n\n\n#### STANDARDIZATION\n\n**SOURCE NEEDED:** preprocessing.h, proecessing.cpp and statx.h \u003cbr/\u003e\n\nStandardScaler will standardize features by removing the mean and scaling to unit variance. _ref:_ [Scikit Learn docs](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \"preprocessing.h\"\n\nint main()\n{\n\tStandardScaler scaler({0, 0, 1, 1});\n\tstd::vector\u003cdouble\u003e scaled = scaler.scale();\n\t// Scaled value and inverse scaling\n\tfor (double i : scaled)\n\t{\n\t\tstd::cout \u003c\u003c i \u003c\u003c \" \" \u003c\u003c scaler.inverse_scale(i) \u003c\u003c \"\\n\";\n\t}\n}\n```\n\n#### NORMALIZATION:\n\n**SOURCE NEEDED:** preprocessing.h, proecessing.cpp and statx.h\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \"preprocessing.h\"\n\nint main()\n{\n\tstd::vector\u003cdouble\u003e normalized_vec = preprocessing::normalize({ 800, 10, 12, 78, 56, 49, 7, 1200, 1500 });\n\tfor (double i : normalized_vec) std::cout \u003c\u003c i \u003c\u003c \" \";\n}\n```\n\n#### LABEL ENCODING:\n\n**SOURCE NEEDED:** preprocessing.h and preprocessing.cpp\n\nLabel encoding is the process of encoding the categorical data into numerical data. For example if a column in the dataset contains country values like GERMANY, FRANCE, ITALY then label encoder will convert this categorical data into numerical data like this\n\ncountry - categorical |country - numerical\n-------------------|-------------------\nGERMANY | 1\nFRANCE | 0\nITALY | 2\n\n_Example code:_\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \u003cstring\u003e\n#include \"preprocessing.h\"\n\nint main()\n{\n\tstd::vector\u003cstd::string\u003e categorical_data = { \"GERMANY\", \"FRANCE\", \"ITALY\" };\n\tLabelEncoder\u003cstd::string\u003e encoder(categorical_data);\n\tstd::vector\u003cunsigned long int\u003e numerical_data = encoder.fit_transorm();\n\tfor (int i = 0; i \u003c categorical_data.size(); i++)\n\t{\n\t\tstd::cout \u003c\u003c categorical_data[i] \u003c\u003c \" - \" \u003c\u003c numerical_data[i] \u003c\u003c \"\\n\";\n\t}\n}\n```\n\n#### Label Binarization:\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \u003cstring\u003e\n#include \"preprocessing.h\"\n\nint main()\n{\n    std::vector\u003cstd::string\u003e ip_addresses = { \"A\", \"B\", \"A\", \"B\", \"C\" };\n    LabelBinarizer\u003cstd::string\u003e binarize(ip_addresses);\n    std::vector\u003cstd::vector\u003cunsigned long int\u003e\u003e result = binarize.fit();\n    for (std::vector\u003cunsigned long int\u003e i : result)\n    {\n        for (unsigned long int j : i) std::cout \u003c\u003c j \u003c\u003c \" \";\n        std::cout \u003c\u003c \"\\n\";\n    }\n    // Predict\n    std::cout \u003c\u003c \"Prediction:\\n-------------\\n\";\n    std::string test = \"D\";\n    std::vector\u003cunsigned long int\u003e prediction = binarize.predict(test);\n    for (unsigned long int i : prediction) std::cout \u003c\u003c i \u003c\u003c \" \";\n}\n```\n\n\n#### LEAST SQUARES REGRESSION(SIMPLE LINEAR REGRESSION)\n\n**HEADERS NEEDED:** lsr.h and lsr.cpp\n\n_Creating new model and saving it:_\u003cbr/\u003e\n\n**DATASET:**\n\nX|y\n-|--\n2|4\n3|5\n5|7\n7|10\n9|15\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \"lsr.h\"\n\nint main()\n{\n\t// X, y, print_debug messages\n\tsimple_linear_regression slr({2, 3, 5, 7, 9}, {4, 5, 7, 10, 15}, DEBUG);\n\tslr.fit();\n\tstd::cout \u003c\u003c slr.predict(8);\n\tslr.save_model(\"model.txt\");\n}\n```\n\n\nLoading existing model\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \"lsr.h\"\n\nint main()\n{\n\t// X, y, print_debug messages\n\tsimple_linear_regression slr(\"model.txt\");\n\tstd::cout \u003c\u003c slr.predict(8);\n}\n\n```\n\n**SAMPLE PREDICTION PLOTTED:**\n![](static/slr.png)\n\n\n#### Multiple Linear Regression:\n\nTraining and saving the model\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \"mlr.h\"\n\nint main()\n{\n\tLinearRegression mlr({ {110, 40}, {120, 30}, {100, 20}, {90, 0}, {80, 10} }, {100, 90, 80, 70, 60}, NODEBUG);\n\tmlr.fit();\n\tstd::cout \u003c\u003c mlr.predict({ 110, 40 });\n\tmlr.save_model(\"model.json\");\n}\n```\n\nLoading the saved model\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \"mlr.h\"\n\nint main()\n{\n\t// Don't use fit method here\n\tLinearRegression mlr(\"model.json\");\n\tstd::cout \u003c\u003c mlr.predict({ 110, 40 });\n}\n```\n\n\n#### Classification - Gaussian Naive Bayes\n\nClassification male - female using height, weight, foot size and saving the model.\n\n**HEADERS / SOURCE NEEDED:** naive_bayes.h, naive_bayes.cpp, json.h\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \"naive_bayes.h\"\n\nint main()\n{\n\tgaussian_naive_bayes nb({ {6, 180, 12}, {5.92, 190, 11}, {5.58, 170, 12}, {5.92, 165, 10}, {5, 100, 6}, {5.5, 150, 8}, {5.42, 130, 7}, {5.75, 150, 9} }, { 0, 0, 0, 0, 1, 1, 1, 1 }, DEBUG);\n\tnb.fit();\n\tnb.save_model(\"model.json\");\n\tstd::map\u003cunsigned long int, double\u003e probabilities = nb.predict({ 6, 130, 8 });\n\tdouble male = probabilities[0];\n\tdouble female = probabilities[1];\n\tif (male \u003e female) std::cout \u003c\u003c \"MALE\";\n\telse std::cout \u003c\u003c \"FEMALE\";\n}\n```\n\n_Loading a saved model:_\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \"naive_bayes.h\"\n\nint main()\n{\n\tgaussian_naive_bayes nb(NODEBUG);\n\tnb.load_model(\"model.json\");\n\tstd::map\u003cunsigned long int, double\u003e probabilities = nb.predict({ 6, 130, 8 });\n\tdouble male = probabilities[0];\n\tdouble female = probabilities[1];\n\tif (male \u003e female) std::cout \u003c\u003c \"MALE\";\n\telse std::cout \u003c\u003c \"FEMALE\";\n}\n```\n\n#### Logistic Regression:\nPlease do not get confused with the word \"regression\" in Logistic regression. It is generally used for classification problems. The heart of the logistic regession is sigmoid activation function. An activation function is a function which takes any input value and outputs value within a certain case. In our case(sigmoid), it returns between 0 and 1.\n\nIn the image, you can see the output(y) of sigmoid activation function for -3 \u003e= x \u003c= 3\n\n![](https://upload.wikimedia.org/wikipedia/commons/thumb/2/2f/Error_Function.svg/1280px-Error_Function.svg.png)\n\nThe idea behind the logistic regression is taking the output from linear regression, i.e., y = mx+c and applying logistic function 1/(1+e^-y) which outputs the value between 0 and 1. We can clearly see this is a binary classifier, i.e., for example, it can be used for classifying binary datasets like predicting whether it is a male or a female using certain parameters.\n\nBut we can use this logistic regression to classify multi-class problems too with some modifications. Here, we are using the one vs rest principle. That is training many linear regression models, for example, if the class count is 10, it will train 10 Linear Regression models by changing the class values with 1 as the class value to predict the probability and 0 to the rest. If you don't understand, here is a detailed explanation: [https://prakhartechviz.blogspot.com/2019/02/multi-label-classification-python.html](https://prakhartechviz.blogspot.com/2019/02/multi-label-classification-python.html)\n\nWe are going to take a simple classification problem to classify whether it is a male or female.\n\nClassification male - female using height, weight, foot size and saving the model. Here is our dataset:\n\n![](https://www.codeproject.com/KB/recipes/5246467/1_BfEv3WPJtsijmYeJxR89Vg.png)\n\nAll we have to do is to predict whether the person is male or female using height, weight and foot size.\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \"logistic_regression.h\"\n\nint main()\n{\n    logistic_regression lg({ { 6, 180, 12 },{ 5.92, 190, 11 },{ 5.58, 170, 12 },\n        { 5.92, 165, 10 },{ 5, 100, 6 },{ 5.5, 150, 8 },{ 5.42, 130, 7 },{ 5.75, 150, 9 } },\n        { 0, 0, 0, 0, 1, 1, 1, 1 }, NODEBUG);\n    lg.fit();\n    // Save the model\n    lg.save_model(\"model.json\");\n    std::map\u003cunsigned long int, double\u003e probabilities = lg.predict({ 6, 130, 8 });\n    double male = probabilities[0];\n    double female = probabilities[1];\n    if (male \u003e female) std::cout \u003c\u003c \"MALE\";\n    else std::cout \u003c\u003c \"FEMALE\";\n}\n```\n\nand loading a saved model:\n\n```c++\n// SWAMI KARUPPASWAMI THUNNAI\n\n#include \u003ciostream\u003e\n#include \"logistic_regression.h\"\n\nint main()\n{\n    logistic_regression lg(\"model.json\");\n    std::map\u003cunsigned long int, double\u003e probabilities = lg.predict({ 6, 130, 8 });\n    double male = probabilities[0];\n    double female = probabilities[1];\n    if (male \u003e female) std::cout \u003c\u003c \"MALE\";\n    else std::cout \u003c\u003c \"FEMALE\";\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvisweswaran1998%2Fsklearn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvisweswaran1998%2Fsklearn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvisweswaran1998%2Fsklearn/lists"}