{"id":21011572,"url":"https://github.com/zhuzilin/np_ml","last_synced_at":"2025-05-08T21:17:52.718Z","repository":{"id":38355613,"uuid":"125611479","full_name":"zhuzilin/NP_ML","owner":"zhuzilin","description":"A tool library of classical machine learning algorithms with only numpy.","archived":false,"fork":false,"pushed_at":"2021-03-03T01:46:11.000Z","size":1297,"stargazers_count":223,"open_issues_count":1,"forks_count":69,"subscribers_count":12,"default_branch":"master","last_synced_at":"2025-05-08T21:17:38.685Z","etag":null,"topics":["machine-learning","numpy","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zhuzilin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-17T08:45:10.000Z","updated_at":"2025-03-16T13:07:22.000Z","dependencies_parsed_at":"2022-08-09T03:01:53.534Z","dependency_job_id":null,"html_url":"https://github.com/zhuzilin/NP_ML","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhuzilin%2FNP_ML","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhuzilin%2FNP_ML/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhuzilin%2FNP_ML/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zhuzilin%2FNP_ML/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zhuzilin","download_url":"https://codeload.github.com/zhuzilin/NP_ML/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253149622,"owners_count":21861740,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","numpy","python"],"created_at":"2024-11-19T09:29:34.790Z","updated_at":"2025-05-08T21:17:52.696Z","avatar_url":"https://github.com/zhuzilin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NP_ML\n## Introduction\nClassical machine learning algorithms implemented with pure numpy.\n\nThe repo to help you understand the ml algorithms instead of blindly using APIs.\n\n## Directory\u003ca name=\"directory\"\u003e\u003c/a\u003e\n- [Introduction](#introduction)\n- [Directory](#directory)\n- [Algorithm list](#algorithm-list)\n  - [Classify](#classify)\n    - Perceptron\n    - K Nearest Neightbor (KNN)\n    - Naive Bayes\n    - Decision Tree\n    - Random Forest\n    - SVM\n    - AdaBoost\n    - HMM\n  - [Cluster](#cluster)\n    - KMeans\n    - Affinity Propagation\n  - [Manifold Learning](#manifold-learning)\n    - PCA\n    - Locally-linear-embedding (LLE)\n  - [NLP](#nlp)\n    - LDA\n  - [Time Series Analysis](#time-series-analysis)\n    - AR\n- [Usage](#usage)\n  - Installation\n  - Examples for *Statistical Learning Method*(《统计学习方法》)\n- [Reference](#reference)\n## Algorithm List\u003ca name=\"algorithm-list\"\u003e\u003c/a\u003e\n### Classify\u003ca name=\"classify\"\u003e\u003c/a\u003e\n- Perceptron\n\nFor perceptron, the example used the [UCI/iris dataset](https://archive.ics.uci.edu/ml/datasets/iris). Since the basic perceptron is a binary classifier, the example used the data for versicolor and virginica. Also, since the iris dataset is not linear separable, the result may vary much.\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Iris_versicolor_3.jpg/1024px-Iris_versicolor_3.jpg\" height=\"200\"\u003e\n    \u003cimg src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/9/9f/Iris_virginica.jpg/1024px-Iris_virginica.jpg\" height=\"200\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    Figure: versicolor and virginica. Hard to distinguish... Right?\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/zhuzilin/NP_ML/master/examples/figures/perceptron.png\" width=\"480\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    Perceptron result on the Iris dataset.\n\u003c/p\u003e\n\n- K Nearest Neightbor (KNN)\n\nFor KNN, the example also used the UCI/iris dataset.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/zhuzilin/NP_ML/master/examples/figures/knn.png\" width=\"480\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    KNN result on the Iris dataset.\n\u003c/p\u003e\n\n- Naive Bayes\n\nFor naive bayes, the example used the [UCI/SMS Spam Collection Dataset](https://www.kaggle.com/uciml/sms-spam-collection-dataset) to do spam filtering.\n\nFor this example only, for tokenizing, nltk is used. And the result is listed below:\n\n```\npreprocessing data...\n100%|#####################################################################| 5572/5572 [00:00\u003c00:00, 8656.12it/s]\nfinish preprocessing data.\n\n100%|#####################################################################| 1115/1115 [00:00\u003c00:00, 55528.96it/s]\naccuracy:  0.9757847533632287\n```\n\nWe got 97.6% accuracy! That's nice!\n\nAnd we try two examples, a typical ham and a typical spam. The result show as following.\n\n```\nexample ham:\nPo de :-):):-):-):-). No need job aha.\npredict result:\nham\n\nexample spam:\nu r a winner U ave been specially selected 2 receive 澹1000 cash or a 4* holiday (flights inc) speak to a \nlive operator 2 claim 0871277810710p/min (18 )\npredict result:\nspam\n```\n\n- Decision Tree\n\nFor decision tree, the example used the UCI/tic-tac-toe dataset. The input is the status of 9 block and the result is whether x win.\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://upload.wikimedia.org/wikipedia/commons/thumb/3/32/Tic_tac_toe.svg/2000px-Tic_tac_toe.svg.png\" width=\"200\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    tic tac toe.\n\u003c/p\u003e\n\nHere, we use ID3 and CART to generate a one layer tree.\n\nFor the ID3, we have:\n```\nroot\n├── 4 == b : True\n├── 4 == o : False\n└── 4 == x : True\naccuracy = 0.385\n```\nAnd for CART, we have: \n```\nroot\n├── 4 == o : False\n└── 4 != o : True\naccuracy = 0.718\n```\nIn both of them, feature_4 is the status of the center block. We could find out that **the center block matters!!!** And in ID3, the tree has to give a result for 'b', which causes the low accuracy.\n\n- Random Forest\n- SVM\n- AdaBoost\n- HMM\n\n### Cluster\u003ca name=\"cluster\"\u003e\u003c/a\u003e\n- Kmeans\n\nFor kmeans, we use the [make_blob()](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html#sklearn.datasets.make_blobs) function in sklearn to produce toy dataset.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/zhuzilin/NP_ML/master/examples/figures/kmeans.png\" width=\"480\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    Kmeans result on the blob dataset.\n\u003c/p\u003e\n\n- Affinity Propagation\n\nYou can think affinity propagation as an cluster algorithm that generate cluster number automatically.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/zhuzilin/NP_ML/master/examples/figures/affinity_propagation.png\" width=\"480\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    Kmeans result on the blob dataset.\n\u003c/p\u003e\n\n### Manifold Learning\u003ca name=\"manifold-learning\"\u003e\u003c/a\u003e\nIn manifold learning, we all use the simple curve-s data to show the difference between algorithms.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/zhuzilin/NP_ML/master/examples/figures/curve_s.png\" width=\"640\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    Curve S data.\n\u003c/p\u003e\n\n- PCA\n\nThe most popular way to reduce dimension.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/zhuzilin/NP_ML/master/examples/figures/pca.png\" width=\"480\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    PCA visualization.\n\u003c/p\u003e\n\n- LLE\n\nA manifold learning method using only local information.\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"https://raw.githubusercontent.com/zhuzilin/NP_ML/master/examples/figures/lle.png\" width=\"480\"\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    LLE visualization.\n\u003c/p\u003e\n\n### NLP\u003ca name=\"nlp\"\u003e\u003c/a\u003e\n- LDA\n### Time Series Analysis\u003ca name=\"time-series-analysis\"\u003e\u003c/a\u003e\n- AR\n\n## Usage\u003ca name=\"usage\"\u003e\u003c/a\u003e\n- Installation\n\nIf you want to use the visual example, please install the package by:\n```\n  $ git clone https://github.com/zhuzilin/NP_ML\n  $ cd NP_ML\n  $ python setup.py install\n```\n\n- Examples in section \"Algorithm List\"\n\nRun the script in NP_ML/example/ . For example:\n\n```\n  $ cd example/\n  $ python affinity_propagation.py\n```\n\n(Mac/Linux user may face some issue with the data directory. Please change them in the correspondent script).\n\n- Examples for *Statistical Learning Method*(《统计学习方法》)\n\nRun the script in NP_ML/example/StatisticalLearningMethod/ .For example: \n\n```\n  $ cd example/StatisticalLearningMethod\n  $ python adaboost.py\n```\n## Reference\u003ca name=\"reference\"\u003e\u003c/a\u003e\nClassical ML algorithms was validated by naive examples in [*Statistical Learning Method*(《统计学习方法》)](https://www.amazon.com/%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0%E6%96%B9%E6%B3%95%EF%BC%88%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0%E6%96%B9%E6%B3%95-%E7%BB%9F%E8%AE%A1%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86-%E7%AC%AC2%E7%89%88-%E5%85%B12%E6%9C%AC%E5%A5%97%E8%A3%85%EF%BC%89-Chinese-ebook/dp/B01M8KB8FF/ref=sr_1_1?ie=UTF8\u0026qid=1521303280\u0026sr=8-1\u0026keywords=%E7%BB%9F%E8%AE%A1%E5%AD%A6%E4%B9%A0%E6%96%B9%E6%B3%95)\n\nTime series models was validated by example in [Bus 41202](http://faculty.chicagobooth.edu/ruey.tsay/teaching/bs41202/sp2017/)\n\n## Something Else\nCurrently, this repo will only implement algorithms that do not need gradient descent. Those would be arranged in another repo in which I would implement those using framework like pytorch. Coming soon:)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhuzilin%2Fnp_ml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzhuzilin%2Fnp_ml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzhuzilin%2Fnp_ml/lists"}