{"id":13713584,"url":"https://github.com/andygeiss/machine-learning-classification","last_synced_at":"2025-03-31T07:45:07.894Z","repository":{"id":144212740,"uuid":"269107070","full_name":"andygeiss/machine-learning-classification","owner":"andygeiss","description":"This repository provides a Golang implementation from scratch to solve a classification problem using the K-Nearest Neighbour algorithm.","archived":false,"fork":false,"pushed_at":"2020-06-05T12:44:35.000Z","size":34,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-11-30T14:53:08.216Z","etag":null,"topics":["go","golang","k-nearest-neighbours","machine-learning","machine-learning-algorithms","standard-project","template"],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andygeiss.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-03T14:16:12.000Z","updated_at":"2023-04-16T16:03:28.000Z","dependencies_parsed_at":"2024-05-05T17:45:28.264Z","dependency_job_id":null,"html_url":"https://github.com/andygeiss/machine-learning-classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andygeiss%2Fmachine-learning-classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andygeiss%2Fmachine-learning-classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andygeiss%2Fmachine-learning-classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andygeiss%2Fmachine-learning-classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andygeiss","download_url":"https://codeload.github.com/andygeiss/machine-learning-classification/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229510810,"owners_count":18084444,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["go","golang","k-nearest-neighbours","machine-learning","machine-learning-algorithms","standard-project","template"],"created_at":"2024-08-02T23:01:39.837Z","updated_at":"2024-12-13T08:07:10.619Z","avatar_url":"https://github.com/andygeiss.png","language":"Go","readme":"# Machine Learning with Golang (Classification)\n\nThis repository provides a Golang implementation from scratch to solve a classification problem using the K-Nearest Neighbour algorithm.\n\nIt can also be used as a standard project layout for machine learning projects, because it combines the\n[Standard Go Project Layout](https://github.com/golang-standards/project-layout) and \n[Cookiecutter's Data Science - Directory structure](https://drivendata.github.io/cookiecutter-data-science/#directory-structure).\n\n## Iris flower data set\n\nFirst we need to gather the raw data from the external, [official web archive](https://archive.ics.uci.edu/ml/machine-learning-databases/iris).\nNext we transform that CSV to an equal Protobuf format. Finally, we organize our final, processed Model format.\n\n    data\n    ├── external\n    │   └── iris.csv\n    ├── interim\n    │   └── iris_interim.pb\n    └── processed\n        └── iris_processed.pb\n        \nBut wait! Why Protobuf? Well, with Protobuf we are able to use the model within Golang, Python or many other languages.\nWe do not need any additional data type conversions to use the model in different programming languages.\n\nStart the initial setup with the following commands:\n\n    make\n\nThis will install the protobuf-compiler, corresponding protobuf-gen-go plugin generates the internal API and compiles all commands.\n\n## Internal API\n\nWith the last step we generated our internal API by using the protobuf-compiler.\nOur internal directory structure looks as follows:\n\n    internal\n    ├── api\n    │   ├── iris_interim_pb2.py\n    │   ├── iris_interim.pb.go\n    │   ├── iris_interim.proto\n    │   ├── iris_processed_pb2.py\n    │   ├── iris_processed.pb.go\n    │   └── iris_processed.proto\n\n## Gather and Organize data\n\nWe treat  [data as immutable](https://drivendata.github.io/cookiecutter-data-science/#data-is-immutable). \nThus, we will use a pipeline to separate each step of data manipulation/transformation.\nFinally, we could start building our first pipeline to automatically gather, organize the data and print some common statistics to get the following output:\n\n    ./gather_and_organize_data.bin\n    \n    Statistics:\n       Column           Mean     Median   Mode     Minimum  Maximum  Range    Variance Std Dev \n       Petal length     5.84     5.80     5.00     4.30     7.90     3.60     0.68     0.83    \n       Petal width      3.05     3.00     3.00     2.00     4.40     2.40     0.19     0.43    \n       Sepal length     3.76     4.35     1.50     1.00     6.90     5.90     3.09     1.76    \n       Sepal width      1.20     1.30     0.20     1.00     6.90     5.90     3.09     1.76    \n\nThe corresponding source of the command could be found [here](cmd/gather_and_organize_data/main.go).\n\n## Evaluate the Model\n\nThe K-Nearest Neighbour is a lazy algorithm. It doesn't learn via training, it \"memorizes\" the training dataset instead.\nThus, we don't call the following step model training. We evaluate the parameter k and feature-combinations in the Iris flower data set.\nIn addition to that we use [Standardization](pkg/floats/scale.go) to scale the values down between 0 and 1 and\nzero values are replaced by the [Mean](pkg/floats/central_tendency.go). \n\n    ./evaluate_model.bin\n    \n    Statistics:\n       Column           Mean     Median   Mode     Minimum  Maximum  Range    Variance Std Dev \n       Petal length     0.43     0.42     0.19     0.00     1.00     1.00     0.05     0.23    \n       Petal width      0.44     0.42     0.42     0.00     1.00     1.00     0.03     0.18    \n       Sepal length     0.47     0.57     0.08     0.00     1.00     1.00     0.09     0.30    \n       Sepal width      0.46     0.50     0.04     0.00     1.00     1.00     0.09     0.30    \n    K-Nearest Neighbour with k = 3 :\n       Petal Length/Width Accuracy: 72.00\n       Sepal Length/Width Accuracy: 95.00\n    Evaluation time: 11.786393ms\n\nThe corresponding source of the command could be found [here](cmd/evaluate_model/main.go).\n\nFinally, we found out that the Sepal length and Sepal has a very high accuracy of 95.00%.\n\n## Predict\n\nThe final model will be stored at \u003ccode\u003emodels/iris_knn.pb\u003c/code\u003e.\nTo predict a single feature combination of Sepal length (x) and Sepal width (y) with K=3 use the following command:\n\n    ./predict.bin -x 3 -y 4 -k 3\n    \n    K-Nearest Neighbour - K: 3, Given: [3 4], Predicted: Iris-setosa\n    Prediction time: 152.338µs\n\nThe corresponding source of the command could be found [here](cmd/predict/main.go).\n\n**UPDATE**: Prediction time reduced by ~20% using [Manhattan distance-calculation](pkg/floats/knn.go).\n\n    ./predict.bin -x 3 -y 4 -k 3\n   \n    K-Nearest Neighbour - K: 3, Given: [3 4], Predicted: Iris-setosa\n    Prediction time: 127.856µs\n","funding_links":[],"categories":["Repositories"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandygeiss%2Fmachine-learning-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandygeiss%2Fmachine-learning-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandygeiss%2Fmachine-learning-classification/lists"}