{"id":21293321,"url":"https://github.com/ejw-data/ml-playground","last_synced_at":"2026-05-06T11:31:28.003Z","repository":{"id":106650212,"uuid":"529482621","full_name":"ejw-data/ml-playground","owner":"ejw-data","description":"Testing the limitations, inabilities, and strengths of models with synthetic data","archived":false,"fork":false,"pushed_at":"2023-09-14T17:52:01.000Z","size":8996,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-15T16:44:29.256Z","etag":null,"topics":["machine-learning","python","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ejw-data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-27T04:39:01.000Z","updated_at":"2022-11-07T17:42:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"fa8f0d2a-3479-4a27-b54f-e4513e787cac","html_url":"https://github.com/ejw-data/ml-playground","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ejw-data/ml-playground","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fml-playground","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fml-playground/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fml-playground/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fml-playground/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ejw-data","download_url":"https://codeload.github.com/ejw-data/ml-playground/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ejw-data%2Fml-playground/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262419701,"owners_count":23308095,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","python","scikit-learn"],"created_at":"2024-11-21T13:54:19.162Z","updated_at":"2026-05-06T11:31:22.970Z","avatar_url":"https://github.com/ejw-data.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ml-playground  \n\nAuthor:  Erin James Wills, ejw.data@gmail.com  \n\n![Machine Learning Playground](./images/ml-playground.png)  \n\u003ccite\u003ePhoto by \u003ca href=\"https://unsplash.com/@jontyson?utm_source=unsplash\u0026utm_medium=referral\u0026utm_content=creditCopyText\"\u003eJon Tyson\u003c/a\u003e on \u003ca href=\"https://unsplash.com/s/photos/playground?utm_source=unsplash\u0026utm_medium=referral\u0026utm_content=creditCopyText\"\u003eUnsplash\u003c/a\u003e\u003c/cite\u003e\n\u003cbr\u003e\n\n## Overview  \n\u003chr\u003e  \nTesting the limitations, inabilities, and strengths of models with synthetic data  \n\n\u003cbr\u003e\n\n## Content  \n* Classification\n    * Gaussian Naive Bayes (continuous data)\n    * Naive Bayes (text)\n    * Gradient Boosting\n    * Linear Discriminant Learning\n    * Quadratic Discriminant Learning\n    * Logistic\n    * KNN\n    * Decision Trees\n    * Random Forest\n    * Isolation Forest - identifies early terminating branches as outliers\n    * Support Vector Classifier  \n\u003cbr\u003e\n\n* Clustering\n    * Centroid/Distance\n        * KMeans\n    * Density\n        * DBSCAN\n    * Hierarchical\n        * Agglomerative Hierarchy\n        * Divisive Hierarchical\n    * Gaussian Mixture Models (GMM)\n    * Mean Shift\n    * Affinity Propogation\n    * Fuzzy C-Means (*in-progress)\n    * Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH)\n    * Self-Organizing Maps (SOM)\n    * Ordering Points to Identify the Clustering Structure (OPTICS)\n    * Spectral  \n\n\u003cbr\u003e\n\n* Computer Vision (future)\n\n\u003cbr\u003e\n\n* Natural Language Processing \n    * Keyword Extraction\n    * Lemmazation\n    * Named Entity Recognition\n    * Named Recognition\n    * Sentiment\n    * Stemming\n    * String Matching\n    * Summarization\n    * Text Classification\n    * Topic Modelling  \n\u003cbr\u003e\n\n* Neural Networks\n    * Classification Binary\n    * Classification KMeans\n    * Classification Binary Backpropogation (*in-progress)\n    * Classification Multiclass\n    * Clustering Self Organizing Maps (*in-progress)\n    * Genetic Neural Networks (*in-progress)\n    * Neural Structured Learning\n    * Multiregression (linear)\n    * NeuroEvolution Augmenting Topologies (*in-progress)\n    * Regression (linear)\n    * Regression (non-linear)  \n\n\u003cbr\u003e\n\n* Recommendation Engines - [Link](https://medium.com/mlearning-ai/recommendation-systems-arl-association-rule-learning-bed1a07b5d9a#:~:text=Simple%20Recommender%20Systems%3A%20Makes%20general,based%20on%20similarities%20of%20products.)\n    * Simple Recommendation Systems - uses business knowledge/heuristics or simple sequencing\n        * Utility-based - rules based on simple filters\n        * Knowledge-based - rules based on user input that is used as a filter\n    * Association Rule Learning - uses rules learned through recurrances in a dataset  \n        * Basket of Good Random Forest - uses random forests to group products together\n        * Basket of Goods Association Rule - uses apriori and association to generate relations\n    * Content-Based Filtering - uses similarities to group items and assume the same outcome\n        * Text Similarity (CountVectorizer) - uses a text corpus and converts to vector such that similar vectors are related\n        * Content-based - uses product similarity to suggest similar options\n        * Content-based2 - same as above\n    * Collaborative Filtering (User, Product, or Model-based) - uses common relationships beteen users and products\n        * Demographic-based - uses trends in demographics to create a rule\n        * Collaborative Filtering (*in-progress)\n    * Ensemble - uses multiple techniques\n        * Hybrid (*in-progress) - uses content-based and collaborative filtering  \n\n\u003cbr\u003e\n\n* Regression\n    * Feature Transformation (*in-progress) \n    * Time Lag of Feature - incorporates a time dependent feature into the model\n    * Change Over Time - incorporates the difference between consecutive datapoints\n    * Lowess and Loess Smoothing - non-parametric smoothing\n    * Ordinary Least Squares - typically non-regularized method that minimizes the sum of the squared errors\n    * Lasso - regularization method that shrinks the coefficients to even zero (l1 regularization)\n    * Ridge - regularization method that minimizes the model complexity (especially multicollinearity) but doesn't shrink the coeffients to zero (l2 regularization)\n    * ElasticNet - regularization method that uses l1 and l2 regularization\n    * Polynomial - transform features then applies the ordinary least squares regression\n    * Support Vector Regression\n    * Quantile Regression \n\n\u003cbr\u003e\n\n* Reinforcement Learning (future)  \n\n\u003cbr\u003e\n\n* Time Series\n    * Survival Analysis - expected duration of time until an event occurs\n    * Time Series Decomposition - separating cyclic events from trends\n\n\u003cbr\u003e\n\n## Technologies    \n*  Python\n\n\u003cbr\u003e\n\n## Libraries\n* Lifelines\n* statsmodels.tsa\n* neat\n* yake\n* nltk\n* spacy\n* difflib\n* transformers\n* scikit-learn\n* tensorflow\n* keras\n* minisom\n* deap\n* neural_structured_learning\n* mlxtend\n* scikit-surprise\n* gym\n\n\u003cbr\u003e\n\n## Data Source  \nDatasets were generated through scikit-learn data generators or simple algorithms\n\n\u003cbr\u003e\n\n## Setup and Installation  \n1. Environment needs the following:  \n    *  Python 3.6+   \n    *  scikit-learn\n1. Activate your environment\n1. Clone the repo to your local machine\n1. Navigate the terminal to the repo folder\n1. In the terminal, run any of the Jupyter Notebooks  \n\n\u003cbr\u003e\n\n## References\n- https://www.machinelearningplus.com/time-series/time-series-analysis-python/ \n- https://www.machinelearningplus.com/time-series/arima-model-time-series-forecasting-python/\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fejw-data%2Fml-playground","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fejw-data%2Fml-playground","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fejw-data%2Fml-playground/lists"}