{"id":20472970,"url":"https://github.com/adityajn105/mlfromscratch","last_synced_at":"2025-07-11T20:32:41.160Z","repository":{"id":41202767,"uuid":"225105989","full_name":"adityajn105/MLfromScratch","owner":"adityajn105","description":"Library for machine learning where all algorithms are implemented from scratch. Used only numpy.","archived":false,"fork":false,"pushed_at":"2024-10-05T19:37:02.000Z","size":130,"stargazers_count":23,"open_issues_count":3,"forks_count":8,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-13T11:35:54.887Z","etag":null,"topics":["clustering-algorithm","decision-trees","ensemble-learning","evaluation-metrics","from-scratch","hacktoberfest","hacktoberfest-accepted","implementation-of-algorithms","linear-models","machine-learning","machine-learning-algorithms","mlfromscratch","naive-bayes"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/adityajn105.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-12-01T04:23:45.000Z","updated_at":"2025-01-08T04:55:51.000Z","dependencies_parsed_at":"2025-04-14T13:15:43.691Z","dependency_job_id":null,"html_url":"https://github.com/adityajn105/MLfromScratch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/adityajn105/MLfromScratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adityajn105%2FMLfromScratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adityajn105%2FMLfromScratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adityajn105%2FMLfromScratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adityajn105%2FMLfromScratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/adityajn105","download_url":"https://codeload.github.com/adityajn105/MLfromScratch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/adityajn105%2FMLfromScratch/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264892771,"owners_count":23679361,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering-algorithm","decision-trees","ensemble-learning","evaluation-metrics","from-scratch","hacktoberfest","hacktoberfest-accepted","implementation-of-algorithms","linear-models","machine-learning","machine-learning-algorithms","mlfromscratch","naive-bayes"],"created_at":"2024-11-15T14:22:52.082Z","updated_at":"2025-07-11T20:32:41.152Z","avatar_url":"https://github.com/adityajn105.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003e🧠 MLfromScratch\u003c/h1\u003e\n\n![Python](https://img.shields.io/badge/Python-3.8+-blue?style=flat-square\u0026logo=python\u0026logoColor=white)\n![NumPy](https://img.shields.io/badge/NumPy-Library-green?style=flat-square\u0026logo=numpy\u0026logoColor=white)\n![License](https://img.shields.io/github/license/adityajn105/MLfromScratch?style=flat-square)\n![GitHub contributors](https://img.shields.io/github/contributors/adityajn105/MLfromScratch?style=flat-square)\n\n\u003c/div\u003e\n\n**MLfromScratch** is a library designed to help you learn and understand machine learning algorithms by building them from scratch using only `NumPy`! No black-box libraries, no hidden magic—just pure Python and math. It's perfect for beginners who want to see what's happening behind the scenes of popular machine learning models.\n\n🔗 **[Explore the Documentation](https://github.com/adityajn105/MLfromScratch/wiki)**\n\n---\n\n## 📦 Package Structure\n\nOur package structure is designed to look like `scikit-learn`, so if you're familiar with that, you'll feel right at home!\n\n### 🔧 Modules and Algorithms (Explained for Beginners) \u003cbr\u003e\u003c/br\u003e\n#### 📈 **1. Linear Models (`linear_model`)**\n\n- **LinearRegression** ![Linear Regression](https://img.shields.io/badge/Linear%20Regression-blue?style=flat-square\u0026logo=mathworks): Imagine drawing a straight line through a set of points to predict future values. Linear Regression helps in predicting something like house prices based on size.\n  \n- **SGDRegressor** ![SGD](https://img.shields.io/badge/SGD-Fast-blue?style=flat-square\u0026logo=rocket): A fast way to do Linear Regression using Stochastic Gradient Descent. Perfect for large datasets.\n\n- **SGDClassifier** ![Classifier](https://img.shields.io/badge/SGD-Classifier-yellow?style=flat-square\u0026logo=target): A classification algorithm predicting categories like \"spam\" or \"not spam.\" \u003cbr\u003e\u003c/br\u003e\n\n\n#### 🌳 **2. Decision Trees (`tree`)**\n\n- **DecisionTreeClassifier** ![Tree](https://img.shields.io/badge/Tree-Classifier-brightgreen?style=flat-square\u0026logo=leaf): Think of this as playing 20 questions to guess something. A decision tree asks yes/no questions to classify data.\n  \n- **DecisionTreeRegressor** ![Regressor](https://img.shields.io/badge/Tree-Regressor-yellowgreen?style=flat-square\u0026logo=mathworks): Predicts a continuous number (like temperature tomorrow) based on input features. \u003cbr\u003e\u003c/br\u003e\n\n\n#### 👥 **3. K-Nearest Neighbors (`neighbors`)**\n\n- **KNeighborsClassifier** ![KNN](https://img.shields.io/badge/KNN-Classifier-9cf?style=flat-square\u0026logo=people-arrows): Classifies data by looking at the 'k' nearest neighbors to the new point.\n\n- **KNeighborsRegressor** ![KNN](https://img.shields.io/badge/KNN-Regressor-lightblue?style=flat-square\u0026logo=chart-bar): Instead of classifying, it predicts a number based on nearby data points. \u003cbr\u003e\u003c/br\u003e\n\n\n#### 🧮 **4. Naive Bayes (`naive_bayes`)**\n\n- **GaussianNB** ![Gaussian](https://img.shields.io/badge/GaussianNB-fast-brightgreen?style=flat-square\u0026logo=matrix): Works great for data that follows a normal distribution (bell-shaped curve).\n\n- **MultinomialNB** ![Multinomial](https://img.shields.io/badge/MultinomialNB-text-ff69b4?style=flat-square\u0026logo=alphabetical-order): Ideal for text classification tasks like spam detection. \u003cbr\u003e\u003c/br\u003e\n\n\n#### 📊 **5. Clustering (`cluster`)**\n\n- **KMeans** ![KMeans](https://img.shields.io/badge/KMeans-Clustering-ff69b4?style=flat-square\u0026logo=group): Groups data into 'k' clusters based on similarity.\n  \n- **AgglomerativeClustering** ![Agglomerative](https://img.shields.io/badge/Agglomerative-Hierarchical-blueviolet?style=flat-square\u0026logo=chart-bar): Clusters by merging similar points until a single large cluster is formed.\n\n- **DBSCAN** ![DBSCAN](https://img.shields.io/badge/DBSCAN-Noise%20Filtering-blue?style=flat-square\u0026logo=waves): Groups points close to each other and filters out noise. No need to specify the number of clusters!\n\n- **MeanShift** ![MeanShift](https://img.shields.io/badge/MeanShift-Clustering-yellowgreen?style=flat-square\u0026logo=sort-amount-up): Shifts data points toward areas of high density to find clusters. \u003cbr\u003e\u003c/br\u003e\n\n\n#### 🌲 **6. Ensemble Methods (`ensemble`)**\n\n- **RandomForestClassifier** ![RandomForest](https://img.shields.io/badge/Random%20Forest-Classifier-brightgreen?style=flat-square\u0026logo=forest): Combines multiple decision trees to make stronger decisions.\n  \n- **RandomForestRegressor** ![RandomForest](https://img.shields.io/badge/Random%20Forest-Regressor-lightblue?style=flat-square\u0026logo=tree): Predicts continuous values using an ensemble of decision trees.\n\n- **GradientBoostingClassifier** ![GradientBoosting](https://img.shields.io/badge/Gradient%20Boosting-Classifier-9cf?style=flat-square\u0026logo=chart-line): Builds trees sequentially, each correcting errors made by the last.\n\n- **VotingClassifier** ![Voting](https://img.shields.io/badge/Voting-Classifier-orange?style=flat-square\u0026logo=thumbs-up): Combines the results of multiple models to make a final prediction. \u003cbr\u003e\u003c/br\u003e\n\n\n#### 📐 **7. Metrics (`metrics`)**\n\nMeasure your model’s performance:\n\n- **accuracy_score** ![Accuracy](https://img.shields.io/badge/Accuracy-High-brightgreen?style=flat-square\u0026logo=bar-chart): Measures how many predictions your model got right.\n\n- **f1_score** ![F1 Score](https://img.shields.io/badge/F1_Score-Balance-lightgreen?style=flat-square\u0026logo=scales): Balances precision and recall into a single score.\n\n- **roc_curve** ![ROC](https://img.shields.io/badge/ROC-Curve-orange?style=flat-square\u0026logo=wave): Shows the trade-off between true positives and false positives. \u003cbr\u003e\u003c/br\u003e\n\n\n#### ⚙️ **8. Model Selection (`model_selection`)**\n\n- **train_test_split** ![TrainTestSplit](https://img.shields.io/badge/Train_Test_Split-blueviolet?style=flat-square\u0026logo=arrows): Splits your data into training and test sets.\n\n- **KFold** ![KFold](https://img.shields.io/badge/KFold-CrossValidation-blue?style=flat-square\u0026logo=matrix): Trains the model in 'k' iterations for better validation. \u003cbr\u003e\u003c/br\u003e\n\n\n#### 🔍 **9. Preprocessing (`preprocessing`)**\n\n- **StandardScaler** ![StandardScaler](https://img.shields.io/badge/StandardScaler-Normalization-ff69b4?style=flat-square\u0026logo=arrows-v): Standardizes your data so it has a mean of 0 and a standard deviation of 1.\n\n- **LabelEncoder** ![LabelEncoder](https://img.shields.io/badge/LabelEncoder-Classification-yellow?style=flat-square\u0026logo=code): Converts text labels into numerical labels (e.g., \"cat\", \"dog\").\u003cbr\u003e\u003c/br\u003e\n\n\n#### 🧩 **10. Dimensionality Reduction (`decomposition`)**\n\nDimensionality Reduction helps in simplifying data while retaining most of its valuable information. By reducing the number of features (dimensions) in a dataset, it makes data easier to visualize and speeds up machine learning algorithms.\n\n- **PCA (Principal Component Analysis)** ![PCA](https://img.shields.io/badge/PCA-PrincipalComponentAnalysis-orange?style=flat-square\u0026logo=chart-line): PCA reduces the number of dimensions by finding new uncorrelated variables called principal components. It projects your data onto a lower-dimensional space while retaining as much variance as possible. \u003cbr\u003e\u003c/br\u003e\n  - **How It Works**: PCA finds the axes (principal components) that maximize the variance in your data. The first principal component captures the most variance, and each subsequent component captures progressively less.\n  - **Use Case**: Use PCA when you have many features, and you want to simplify your dataset for better visualization or faster computation. It is particularly useful when features are highly correlated.\n\n---\n\n## 🎯 Why Use This Library?\n\n- **Learning-First Approach**: If you're a beginner and want to *understand* machine learning, this is the library for you. No hidden complexity, just code.\n- **No Hidden Magic**: Everything is written from scratch, so you can see exactly how each algorithm works.\n- **Lightweight**: Uses only `NumPy`, making it fast and easy to run. \u003cbr\u003e\u003c/br\u003e\n\n## 🚀 Getting Started\n\n```bash\n# Clone the repository\ngit clone https://github.com/adityajn105/MLfromScratch.git\n\n# Navigate to the project directory\ncd MLfromScratch\n\n# Install the required dependencies\npip install -r requirements.txt\n```\n\u003cbr\u003e\u003c/br\u003e\n\n## 👨‍💻 Author\nThis project is maintained by [Aditya Jain](https://adityajain.me/)\u003cbr\u003e\u003c/br\u003e\n\n## 🧑‍💻 Contributors\nConstributor: [Subrahmanya Gaonkar](https://github.com/negativenagesh)\n\nWe welcome contributions from everyone, especially beginners! If you're new to open-source, don’t worry—feel free to ask questions, open issues, or submit a pull request. \u003cbr\u003e\u003c/br\u003e\n\n## 🤝 How to Contribute\n1. Fork the repository.\n2. Create a new branch (git checkout -b feature-branch).\n3. Make your changes and commit (git commit -m \"Added new feature\").\n4. Push the changes (git push origin feature-branch).\n5. Submit a pull request and explain your changes. \u003cbr\u003e\u003c/br\u003e\n\n## 📄 License\nThis project is licensed under the [MIT License](https://github.com/adityajn105/MLfromScratch/blob/master/LICENSE) - see the LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadityajn105%2Fmlfromscratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadityajn105%2Fmlfromscratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadityajn105%2Fmlfromscratch/lists"}