{"id":18153259,"url":"https://github.com/30mb1/ml-linear-algorithms","last_synced_at":"2026-05-01T19:35:05.042Z","repository":{"id":94105523,"uuid":"88423402","full_name":"30mb1/ML-Linear-Algorithms","owner":"30mb1","description":"Using linear models for classification.","archived":false,"fork":false,"pushed_at":"2017-09-15T09:33:43.000Z","size":898,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-07T00:52:30.053Z","etag":null,"topics":["classification","linear-algorithms","linear-models","machine-learning","machine-learning-algorithms","matplotlib","perceptron","quality","scikit-learn","scikitlearn-machine-learning","svm","svm-classifier"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/30mb1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-04-16T15:40:55.000Z","updated_at":"2017-04-24T18:26:15.000Z","dependencies_parsed_at":"2023-04-03T14:04:18.208Z","dependency_job_id":null,"html_url":"https://github.com/30mb1/ML-Linear-Algorithms","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/30mb1/ML-Linear-Algorithms","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/30mb1%2FML-Linear-Algorithms","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/30mb1%2FML-Linear-Algorithms/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/30mb1%2FML-Linear-Algorithms/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/30mb1%2FML-Linear-Algorithms/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/30mb1","download_url":"https://codeload.github.com/30mb1/ML-Linear-Algorithms/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/30mb1%2FML-Linear-Algorithms/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32510809,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-30T13:12:12.517Z","status":"online","status_checked_at":"2026-05-01T02:00:05.856Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","linear-algorithms","linear-models","machine-learning","machine-learning-algorithms","matplotlib","perceptron","quality","scikit-learn","scikitlearn-machine-learning","svm","svm-classifier"],"created_at":"2024-11-02T03:06:09.294Z","updated_at":"2026-05-01T19:35:05.020Z","avatar_url":"https://github.com/30mb1.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"Linear algorithms\n===================\n\nLinear algorithms are a common class of models that differ in their simplicity and speed of operation. They can be trained for a reasonable time on very large amounts of data, and at the same time they can work with any type of characteristics. Here, I will try to review and compare work of several linear algorithms.\n\n\nRealization in scikit-learn\n----------\nLets's start with [Perceptron](https://en.wikipedia.org/wiki/Perceptron). I will use the implementation of the library [scikit-learn](http://scikit-learn.org/stable/index.html). \nIt is located in the package [sklearn.linear_model](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.linear_model), as a metric I will use the proportion of correct answers - [sklearn.metrics.accuracy_score](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html).\n\n\n```python\nimport pandas as pd\nfrom sklearn.linear_model import Perceptron\nfrom sklearn.metrics import accuracy_score\n\ntr_data = pd.read_csv(\"train.csv\", names=[1,2,3])\nte_data = pd.read_csv(\"test.csv\", names=[1,2,3])\n\ntr_data = tr_data.as_matrix()\n\ntrain_x = [[x[1], x[2]] for x in tr_data]\ntrain_y = [x[0] for x in tr_data]\n\n\nte_data = te_data.as_matrix()\n\ntest_x = [[x[1], x[2]] for x in te_data]\ntest_y = [x[0] for x in te_data]\n\nclf_b = Perceptron(random_state=241)\nclf_b.fit(train_x, train_y)\npredicted_classes = clf_b.predict(test_x)\nbefore_scale = accuracy_score(test_y, predicted) #0.654\n```\n\n\n  As in the case of metric methods, the quality of linear algorithms depends on some properties of the data, for example, the features should be normalized. Otherwise, the quality may fall, because features with bigger scale will make a bigger contribution to result.\n\n\nThis is the result of running the algorithm without scaling the features:\n\n\n![before.png](https://github.com/AlievMagomed/ML-Perceptron-/blob/master/before.png?raw=true)\n\n\nTo scale features, it is convenient to use the class [sklearn.preprocessing.StandardScaler](http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html)\n\n\n```python\nfrom sklearn.preprocessing import StandardScaler\n\nscaler = StandardScaler()\n\nX_train_scaled = scaler.fit_transform(train_x)\nX_test_scaled = scaler.transform(test_x)\n\nclf_a = Perceptron(random_state=241)\nclf_a.fit(X_train_scaled, train_y)\npredicted_classes = clf_a.predict(X_test_scaled)\nafter_scale = accuracy_score(test_y, predicted) #0.854\n```\n\n\n![after.png](https://github.com/AlievMagomed/ML-Perceptron-/blob/master/after.png?raw=true)\n\n## Non-linear datasets\n\n​\tPerceptron cope with the task of binary classification pretty well, but it is clearly not suitable for linearly non-separable datasets. In that case, it is better to use [SVM](https://en.wikipedia.org/wiki/Support_vector_machine). In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the [kernel trick](https://en.wikipedia.org/wiki/Kernel_method), implicitly mapping their inputs into high-dimensional feature spaces.\n\n​\tAgain, I will use scikit-learn. [SVM](http://scikit-learn.org/stable/modules/svm.html) classifier is located in [sklearn.svm](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.svm), many useful tools can be found in [sklearn.model_selection](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection): [train_test_split ](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split) - split arrays or matrices into random train and test subsets, [StratifiedShuffleSplit ](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedShuffleSplit.html#sklearn.model_selection.StratifiedShuffleSplit) - provides train/test indices to split data in train/test sets and [GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) - searches over specified parameter values for an estimator. This time I will use custom dataset, created with [make_circles](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html) of [sklearn.datasets](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets) class.\n\n​\tSVM has many parametrs we can interact with. It is very important to set up the classifier in a right way. Let's see how different settings can affect alorithm's work.\n\n```python\nfrom sklearn.svm import SVC\nfrom sklearn.model_selection import GridSearchCV\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.datasets import make_circles\nfrom sklearn.model_selection import StratifiedShuffleSplit\n\n\n#creating non-linear dataset and and splitting it into training and testing parts\nX, y = make_circles(n_samples=300, noise=0.2, factor=0.5, random_state=241)\nX = scaler.fit_transform(X)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)\n\n#here I will consider only a small set of parametrs for visualization\nC_range = [10, 100, 1000]\ngamma_range = [0.001, 0.1, 10]\n\nfor C in C_range:\n    for gamma in gamma_range:\n        #setting up SVM with current settings\n        clf = SVC(kernel='rbf', C=C, gamma=gamma)\n        clf.fit(X_train, y_train)\n        \n        predicted = clf.predict(X_test)\n        acc = accuracy_score(y_test, predicted)\n\n```\n\n![rbf_params](https://github.com/AlievMagomed/ML-Perceptron-/blob/master/RBF%20params.png?raw=true)\n\nOf course, the search for the optimal combination of parameters can take a long time. In this case  GridSearchCV will help to simplify this process.\n\n````python\n#find best params using GridSearch with rbf kernel\ncv = StratifiedShuffleSplit(n_splits=5, test_size=0.25, random_state=241)\nC_range = np.logspace(-5, 7, num=12)\ngamma_range = np.logspace(-8, 3, num=11)\nparametrs = dict(kernel=['rbf'], gamma=gamma_range, C=C_range)\ngrid = GridSearchCV(SVC(), param_grid=parametrs, cv=cv)\ngrid.fit(X_train, y_train)\n\nprint(\"The best parameters are %s with a score of %.2f\"\n      % (grid.best_params_, grid.best_score_))\n\n#predict is now being called with best found params\npredicted = grid.predict(X_test)\nacc = accuracy_score(y_test, predicted)\n\nprint (\"Accuracy of best-fitted estimator is %.2f\" % acc)\n````\n\n```\nThe best parameters are {'kernel': 'rbf', 'C': 432.87612810830529, 'gamma': 0.039810717055349776} with a score of 0.88\nAccuracy of best-fitted estimator is 0.88\n```\n\nNow let's compare the results of SVM and Perceptron to evaluate the advantages of this algorithm.\n\n![compare](https://github.com/AlievMagomed/ML-Perceptron-/blob/master/rbf_perc_compare.png?raw=true)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F30mb1%2Fml-linear-algorithms","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F30mb1%2Fml-linear-algorithms","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F30mb1%2Fml-linear-algorithms/lists"}