{"id":18419471,"url":"https://github.com/je-suis-tm/machine-learning","last_synced_at":"2025-04-09T23:18:35.150Z","repository":{"id":113363160,"uuid":"128180674","full_name":"je-suis-tm/machine-learning","owner":"je-suis-tm","description":"Python machine learning applications in image processing, recommender system, matrix completion, netflix problem and algorithm implementations including Co-clustering, Funk SVD, SVD++, Non-negative Matrix Factorization, Koren Neighborhood Model, Koren Integrated Model, Dawid-Skene, Platt-Burges, Expectation Maximization, Factor Analysis, ISTA, FISTA, ADMM, Gaussian Mixture Model, OPTICS, DBSCAN, Random Forest, Decision Tree, Support Vector Machine, Independent Component Analysis, Latent Semantic Indexing, Principal Component Analysis, Singular Value Decomposition, K Nearest Neighbors, K Means, Naïve Bayes Mixture Model, Gaussian Discriminant Analysis, Newton Method, Coordinate Descent, Gradient Descent, Elastic Net Regression, Ridge Regression, Lasso Regression, Least Squares, Logistic Regression, Linear Regression","archived":false,"fork":false,"pushed_at":"2022-12-16T16:54:46.000Z","size":8217,"stargazers_count":234,"open_issues_count":0,"forks_count":51,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-09T23:18:28.423Z","etag":null,"topics":["batch-gradient-descent","dbscan","em-algorithm","expectation-maximization","expectation-maximization-algorithm","factor-analysis","independent-component-analysis","k-nearest-neighbors","lasso-regression","latent-semantic-analysis","linear-discriminant-analysis","low-rank-approximation","multinomial-naive-bayes","naive-bayes","newton-method","ridge-regression","sequential-minimal-optimization","singular-value-decomposition","stochastic-gradient-descent","support-vector-machine"],"latest_commit_sha":null,"homepage":"https://je-suis-tm.github.io/machine-learning","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/je-suis-tm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-04-05T08:46:03.000Z","updated_at":"2025-03-04T00:44:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"b50468e6-8ac7-40ab-bace-13b99806ac47","html_url":"https://github.com/je-suis-tm/machine-learning","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/je-suis-tm%2Fmachine-learning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/je-suis-tm%2Fmachine-learning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/je-suis-tm%2Fmachine-learning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/je-suis-tm%2Fmachine-learning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/je-suis-tm","download_url":"https://codeload.github.com/je-suis-tm/machine-learning/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248125592,"owners_count":21051771,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch-gradient-descent","dbscan","em-algorithm","expectation-maximization","expectation-maximization-algorithm","factor-analysis","independent-component-analysis","k-nearest-neighbors","lasso-regression","latent-semantic-analysis","linear-discriminant-analysis","low-rank-approximation","multinomial-naive-bayes","naive-bayes","newton-method","ridge-regression","sequential-minimal-optimization","singular-value-decomposition","stochastic-gradient-descent","support-vector-machine"],"created_at":"2024-11-06T04:17:09.841Z","updated_at":"2025-04-09T23:18:35.111Z","avatar_url":"https://github.com/je-suis-tm.png","language":"Jupyter Notebook","funding_links":[],"categories":["Alpha Research"],"sub_categories":[],"readme":"# Machine Learning\n\n## Intro\n\nMachine learning is so chic that every programmer even non-programmer starts to learn. After several months of online courses, everyone becomes self-proclaimed data scientist. The managers hold high hopes and deploy data scientists to machine learning this or that. In no time, people run into cul-de-sac, things don't work so well outside of the realm of iris dataset! If you have been to my other repositories like \u003ca href=https://github.com/je-suis-tm/quant-trading\u003equant trading\u003c/a\u003e or \u003ca href=https://github.com/je-suis-tm/graph-theory\u003egraph theory\u003c/a\u003e, you must have seen me bashing reckless applications of machine learning. Stop selling AI snake oil! Don't get me wrong. I ain't no machine-learning-sceptic. I see great potential in machine learning but I am merely cynical to the current overstatement of artificial intelligence where it is frankly nowhere in sight. \n\nThe most popular supervised learning has very rigid requirement in both data quality and data quantity. Reinforcement learning is a drain on existing hardware. On the contrary, unsupervised learning is something I mess around frequently. It greatly boosts my work efficiency by dimension reduction, although I struggle to interpret the substantial meaning of the clustering pattern from time to time. In short, machine learning is no panacea. Its strongest suit is classification with discrete answers. When it comes to predicting stock price tomorrow or computing basic reproduction number yesterday, we still have to take the conventional path. \n\nThis repository is based upon the \u003ca href=http://cs229.stanford.edu/syllabus-fall2020.html\u003ecourse material\u003c/a\u003e by Stanford University. Professor Andrew Ng may not teach the most comprehensive lectures but he has inspired millions to study data science. This repository attempts to replicate every algorithm mentioned in the course as well as the popular ones outside of the course. The experienced coders urge us not to reinvent the wheel but I firmly believe we never truly understand how a wheel works until we reinvent it. If you only learn OPTICS from some articles on towardsdatascience.com, you would've skipped DBSCAN since OPTICS does not require the key input ε. Well, by reinventing the wheels, you would come to senses that this is purely quid pro quoi. The introduction of new input ξ is crucial to determine the clustering. Yet, few people talk about it. In that sense, data modelling is not really scientific and will never be that way. Machine learning is a state of art where you fine tune the parameters to create discrete answers to the real-life problems. I sincerely hope this repository can help you see that.\n\n\u003chr\u003e\n\n## Algorithms\n\n### Supervised\n\n* Approximate Bayesian Computation\n\n* Coordinate Descent (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/coordinate%20descent%20for%20elastic%20net.ipynb\u003eLasso Regression\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/coordinate%20descent%20for%20elastic%20net.ipynb\u003eElastic Net Regression\u003c/a\u003e)\n\n* Generative Learning (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/gaussian%20discriminant%20analysis.ipynb\u003eGaussian Discriminant Analysis\u003c/a\u003e)\n\n* Gradient Descent (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/gradient%20descent.ipynb\u003eBatch\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/gradient%20descent.ipynb\u003eStochastic\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/gradient%20descent.ipynb\u003eMini-batch\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eMultiplicative Update\u003c/a\u003e)\n\n* Least Squares (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/coordinate%20descent%20for%20elastic%20net.ipynb\u003eLinear Regression\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/coordinate%20descent%20for%20elastic%20net.ipynb\u003eRidge Regression\u003c/a\u003e)\n\n* Naïve Bayes (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/naive%20bayes.ipynb\u003eMultivariate\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/naive%20bayes.ipynb\u003eMultinomial\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/naive%20bayes.ipynb\u003eTF-IDF\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/naive%20bayes.ipynb\u003eKL Divergence\u003c/a\u003e)\n\n* Newton Method (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/newton%20method%20for%20logistic%20regression.ipynb\u003eLogistic Regression\u003c/a\u003e)\n\n* Support Vector Machine (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/binary%20support%20vector%20machine.ipynb\u003eBinary\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/multiclass%20support%20vector%20machine.ipynb\u003eMulticlass\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/multiclass%20support%20vector%20machine.ipynb\u003eDAG\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/sequential%20minimal%20optimization.ipynb\u003eSMO\u003c/a\u003e)\n\n* Tree-based Learning (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/decision%20tree%20and%20random%20forest.ipynb\u003eDecision Tree\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/decision%20tree%20and%20random%20forest.ipynb\u003eRandom Forest\u003c/a\u003e)\n\n* Instance-based Learning (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/k%20nearest%20neighbors.ipynb\u003eK Nearest Neighbors\u003c/a\u003e)\n\n### Unsupervised\n\n* Centroid-based Model (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/k%20means.ipynb\u003eK Means\u003c/a\u003e)\n\n* Density-based Model (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/dbscan.ipynb\u003eDBSCAN\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/optics.ipynb\u003eOPTICS\u003c/a\u003e)\n\n* Distribution-based Model (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/gaussian%20mixture%20model.ipynb\u003eGaussian Mixture Model\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/naive%20bayes%20mixture%20model.ipynb\u003eNaïve Bayes Mixture Model\u003c/a\u003e)\n\n* Expectation Maximization (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/factor%20analysis.ipynb\u003eFactor Analysis\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/Wisdom%20of%20Crowds%20project/dawid%20skene.ipynb\u003eDawid-Skene Model\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/Wisdom%20of%20Crowds%20project/platt%20burges.ipynb\u003ePlatt-Burges Model\u003c/a\u003e)\n\n* Matrix Completion (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/matrix%20completion.ipynb\u003eISTA\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/matrix%20completion.ipynb\u003eFISTA\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/matrix%20completion.ipynb\u003eADMM\u003c/a\u003e)\n\n* Principal Component Analysis (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/principal%20component%20analysis.ipynb\u003eSingular Value Decomposition\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/principal%20component%20analysis.ipynb\u003eLow Rank Approximation\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/latent%20semantic%20indexing.ipynb\u003eLatent Semantic Indexing\u003c/a\u003e)\n\n* Recommender System (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eAlternating Least Squares\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eFunk SVD\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eSVD++\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eNon-negative Matrix Factorization\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eSlope One\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eKNN with Baseline\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eKoren Neighborhood Model\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eKoren Integrated Model\u003c/a\u003e/\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/recommender%20system.ipynb\u003eCo-clustering\u003c/a\u003e)\n\n* Signal Processing (\u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/independent%20component%20analysis.ipynb\u003eIndependent Component Analysis\u003c/a\u003e)\n\n\n## Applications\n\n### 1. Reverse Engineering project\n\nCreating a visualization from data is easy. In Tableau, it's only one click. What happens if you want to extract data from a visualization? A simple google search yields a few reverse engineering tools, yet they share the same malaise – they only work with single curve and require a lot of clicks. This project addresses these issues by incorporating unsupervised learning into image processing. Multiple curves are separated by different color channels with clustering techniques. Data can be easily extracted via computing coordinates of each pixel. A simple conversion from resolution scale to axis scale approximates the coordinates to the original spreadsheet. Voila, no more ridiculous subscription to Statista :astonished:\n\n![alt text](https://github.com/je-suis-tm/machine-learning/blob/master/Reverse%20Engineering%20project/preview/color%20channels%20bar%20elbow%20method.png)\n\nFor more details, please refer to the \u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/Reverse%20Engineering%20project/README.md\u003eread me page\u003c/a\u003e of a separate directory or \u003ca href=https://je-suis-tm.github.io/machine-learning/reverse-engineering\u003emachine learning\u003c/a\u003e section on my personal blog.\n  \n### 2. Wisdom of Crowds project\n\nEvery now and then, we read some bulge brackets hit the headline, “XXX will reach 99999€ in 20YY”. Some forecasts hit the bull’s eye but most projections are as accurate as astrology. Price prediction can be easily influenced by the cognitive bias. In the financial market, there is merit to the idea that \u003ca href=https://www.investopedia.com/terms/c/consensusestimate.asp\u003econsensus estimate\u003c/a\u003e is the best oracle. By harnessing the power of ensemble learning, we are about to leverage \u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/Wisdom%20of%20Crowds%20project/dawid%20skene.ipynb\u003eDawid-Skene model\u003c/a\u003e and \u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/Wisdom%20of%20Crowds%20project/platt%20burges.ipynb\u003ePlatt-Burges model\u003c/a\u003e to eliminate the idiosyncratic noise associate with each individual judgement. The end game is to reveal the underlying intrinsic value generated by the collective knowledge of research analysts from different investment banks. Is wisdom of crowds a crystal ball for trading? \n\n![alt text](https://github.com/je-suis-tm/machine-learning/blob/master/Wisdom%20of%20Crowds%20project/preview/y1%20forecast%20bias.png)\n\nFor more details, please refer to the \u003ca href=https://github.com/je-suis-tm/machine-learning/blob/master/Wisdom%20of%20Crowds%20project/README.md\u003eread me page\u003c/a\u003e of a separate directory or \u003ca href=https://je-suis-tm.github.io/machine-learning/wisdom-of-crowds\u003emachine learning\u003c/a\u003e section on my personal blog.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fje-suis-tm%2Fmachine-learning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fje-suis-tm%2Fmachine-learning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fje-suis-tm%2Fmachine-learning/lists"}