https://github.com/adityajn105/mlfromscratch
Library for machine learning where all algorithms are implemented from scratch. Used only numpy.
https://github.com/adityajn105/mlfromscratch
clustering-algorithm decision-trees ensemble-learning evaluation-metrics from-scratch hacktoberfest hacktoberfest-accepted implementation-of-algorithms linear-models machine-learning machine-learning-algorithms mlfromscratch naive-bayes
Last synced: 4 months ago
JSON representation
Library for machine learning where all algorithms are implemented from scratch. Used only numpy.
- Host: GitHub
- URL: https://github.com/adityajn105/mlfromscratch
- Owner: adityajn105
- License: mit
- Created: 2019-12-01T04:23:45.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2024-10-05T19:37:02.000Z (about 1 year ago)
- Last Synced: 2025-04-13T11:35:54.887Z (7 months ago)
- Topics: clustering-algorithm, decision-trees, ensemble-learning, evaluation-metrics, from-scratch, hacktoberfest, hacktoberfest-accepted, implementation-of-algorithms, linear-models, machine-learning, machine-learning-algorithms, mlfromscratch, naive-bayes
- Language: Python
- Homepage:
- Size: 127 KB
- Stars: 23
- Watchers: 1
- Forks: 8
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
๐ง MLfromScratch




**MLfromScratch** is a library designed to help you learn and understand machine learning algorithms by building them from scratch using only `NumPy`! No black-box libraries, no hidden magicโjust pure Python and math. It's perfect for beginners who want to see what's happening behind the scenes of popular machine learning models.
๐ **[Explore the Documentation](https://github.com/adityajn105/MLfromScratch/wiki)**
---
## ๐ฆ Package Structure
Our package structure is designed to look like `scikit-learn`, so if you're familiar with that, you'll feel right at home!
### ๐ง Modules and Algorithms (Explained for Beginners)
#### ๐ **1. Linear Models (`linear_model`)**
- **LinearRegression** : Imagine drawing a straight line through a set of points to predict future values. Linear Regression helps in predicting something like house prices based on size.
- **SGDRegressor** : A fast way to do Linear Regression using Stochastic Gradient Descent. Perfect for large datasets.
- **SGDClassifier** : A classification algorithm predicting categories like "spam" or "not spam."
#### ๐ณ **2. Decision Trees (`tree`)**
- **DecisionTreeClassifier** : Think of this as playing 20 questions to guess something. A decision tree asks yes/no questions to classify data.
- **DecisionTreeRegressor** : Predicts a continuous number (like temperature tomorrow) based on input features.
#### ๐ฅ **3. K-Nearest Neighbors (`neighbors`)**
- **KNeighborsClassifier** : Classifies data by looking at the 'k' nearest neighbors to the new point.
- **KNeighborsRegressor** : Instead of classifying, it predicts a number based on nearby data points.
#### ๐งฎ **4. Naive Bayes (`naive_bayes`)**
- **GaussianNB** : Works great for data that follows a normal distribution (bell-shaped curve).
- **MultinomialNB** : Ideal for text classification tasks like spam detection.
#### ๐ **5. Clustering (`cluster`)**
- **KMeans** : Groups data into 'k' clusters based on similarity.
- **AgglomerativeClustering** : Clusters by merging similar points until a single large cluster is formed.
- **DBSCAN** : Groups points close to each other and filters out noise. No need to specify the number of clusters!
- **MeanShift** : Shifts data points toward areas of high density to find clusters.
#### ๐ฒ **6. Ensemble Methods (`ensemble`)**
- **RandomForestClassifier** : Combines multiple decision trees to make stronger decisions.
- **RandomForestRegressor** : Predicts continuous values using an ensemble of decision trees.
- **GradientBoostingClassifier** : Builds trees sequentially, each correcting errors made by the last.
- **VotingClassifier** : Combines the results of multiple models to make a final prediction.
#### ๐ **7. Metrics (`metrics`)**
Measure your modelโs performance:
- **accuracy_score** : Measures how many predictions your model got right.
- **f1_score** : Balances precision and recall into a single score.
- **roc_curve** : Shows the trade-off between true positives and false positives.
#### โ๏ธ **8. Model Selection (`model_selection`)**
- **train_test_split** : Splits your data into training and test sets.
- **KFold** : Trains the model in 'k' iterations for better validation.
#### ๐ **9. Preprocessing (`preprocessing`)**
- **StandardScaler** : Standardizes your data so it has a mean of 0 and a standard deviation of 1.
- **LabelEncoder** : Converts text labels into numerical labels (e.g., "cat", "dog").
#### ๐งฉ **10. Dimensionality Reduction (`decomposition`)**
Dimensionality Reduction helps in simplifying data while retaining most of its valuable information. By reducing the number of features (dimensions) in a dataset, it makes data easier to visualize and speeds up machine learning algorithms.
- **PCA (Principal Component Analysis)** : PCA reduces the number of dimensions by finding new uncorrelated variables called principal components. It projects your data onto a lower-dimensional space while retaining as much variance as possible.
- **How It Works**: PCA finds the axes (principal components) that maximize the variance in your data. The first principal component captures the most variance, and each subsequent component captures progressively less.
- **Use Case**: Use PCA when you have many features, and you want to simplify your dataset for better visualization or faster computation. It is particularly useful when features are highly correlated.
---
## ๐ฏ Why Use This Library?
- **Learning-First Approach**: If you're a beginner and want to *understand* machine learning, this is the library for you. No hidden complexity, just code.
- **No Hidden Magic**: Everything is written from scratch, so you can see exactly how each algorithm works.
- **Lightweight**: Uses only `NumPy`, making it fast and easy to run.
## ๐ Getting Started
```bash
# Clone the repository
git clone https://github.com/adityajn105/MLfromScratch.git
# Navigate to the project directory
cd MLfromScratch
# Install the required dependencies
pip install -r requirements.txt
```
## ๐จโ๐ป Author
This project is maintained by [Aditya Jain](https://adityajain.me/)
## ๐งโ๐ป Contributors
Constributor: [Subrahmanya Gaonkar](https://github.com/negativenagesh)
We welcome contributions from everyone, especially beginners! If you're new to open-source, donโt worryโfeel free to ask questions, open issues, or submit a pull request.
## ๐ค How to Contribute
1. Fork the repository.
2. Create a new branch (git checkout -b feature-branch).
3. Make your changes and commit (git commit -m "Added new feature").
4. Push the changes (git push origin feature-branch).
5. Submit a pull request and explain your changes.
## ๐ License
This project is licensed under the [MIT License](https://github.com/adityajn105/MLfromScratch/blob/master/LICENSE) - see the LICENSE file for details.