Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/shimazadeh/ft_logistic_regression

Recreated Poudlard's Sorting Hat by implementing logistic regression from scratch.
https://github.com/shimazadeh/ft_logistic_regression

Last synced: about 1 month ago
JSON representation

Recreated Poudlard's Sorting Hat by implementing logistic regression from scratch.

Host: GitHub
URL: https://github.com/shimazadeh/ft_logistic_regression
Owner: shimazadeh
Created: 2023-07-13T15:08:15.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-11-16T18:53:27.000Z (about 1 year ago)
Last Synced: 2023-11-17T00:37:46.464Z (about 1 year ago)
Language: Python
Homepage:
Size: 1.78 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # DataScience | Logistic Regression | 42Paris

Implement one-vs-all logistic regression that will solve classification problem: 

- Implementation of pandas.DataFrame.describe from scratch

- Implementation of data visulazionation tools from scratch to make insights and develop an intuition of what the data looks like

- Recreated Poudlard's Sorting Hat by implementing logistic regression from scratch.

## Requirements:

- Python 3

- NumPy

- Pandas

- Matplotlib

- Sklearn

- Tabulate

- Scipy

## How to Run:



  git clone https://github.com/shimazadeh/Ft_logistic_regression.git DSLR

  cd DSLR

  pip3 install -r requirements.txt

  python main.py config.yaml: config.yaml file must include necessary information for training and testing purposes


## Implementation

The following sections indicates the method and results for each part of the program, note all the methods are developed from scratch:

### Data Analysis

describe.py is implementation of pandas.DataFrame.describe. This program takes a dataset as a parameter and it displays all the statistical 

parameters of all numerical features. See the data analysis folder for the code implementation. Here is the output of the dataset used in this project:

|          | Arithmancy | Astronomy | Herbology | Defense Against the Dark Arts | Divination | Muggle Studies | Ancient Runes | History of Magi | Transfiguration | Potions | Care of Magical Creatures | Charms | Flying |

| -------- | ----------- | -------- | -------- | -------- | ------- | -------- | ------- | ------- | ------- | ------- | ------- | -------- | ------- |

| count | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 | 1251 |

| mean     | 49453.1     | 46.4764  | 1.1895   | -0.4648  | 3.2138  | -222.904 | 496.252 | 2.9786  | 1029.86 | 5.9613  | -0.0643 | -243.326 | 23.109  |

| std      | 16701.6     | 520.946  | 5.2231   | 5.2095   | 4.111   | 484.986  | 106.711 | 4.457   | 43.9829 | 3.1029  | 0.9726  | 8.7904   | 97.755  |

| skew     | 2.78942e+08 | 271385   | 27.2812  | 27.1385  | 16.9003 | 235211   | 11387.2 | 19.8645 | 1934.49 | 9.6281  | 0.946   | 77.2712  | 9556.04 |

| kurtosis | -0.0525     | -0.1174  | -0.4316  | 0.1174   | -1.4067 | 0.8039   | 0.0318  | -1.0414 | -1.2183 | 0.0033  | -0.0202 | 0.3781   | 0.859   |

| variance | 0.2119      | -1.693   | -1.3692  | -1.693   | 0.6879  | -0.7592  | -1.5902 | -0.1    | 0.1994  | -0.5513 | 0.0342  | -1.088   | -0.1605 |

| min      | -24370      | -966.74  | -10.2957 | -10.1621 | -8.727  | -1043.96 | 283.87  | -8.4311 | 906.627 | -3.6208 | -3.3137 | -261.049 | -181.47 |

| 25%      | 38180       | -485.323 | -4.2523  | -5.2835  | 3.1205  | -573.969 | 396.41  | 2.2309  | 1025.64 | 3.6842  | -0.6944 | -250.586 | -40.085 |

| 50%      | 48793       | 272.072  | 3.5264   | -2.7207  | 4.621   | -419.164 | 464.328 | 4.4026  | 1045.48 | 5.8685  | -0.0651 | -244.789 | -1.92   |

| 75%      | 60794.5     | 528.346  | 5.4637   | 4.8532   | 5.727   | 264.144  | 597.517 | 5.8939  | 1058.33 | 8.2067  | 0.5756  | -232.528 | 52.625  |

| max      | 104956      | 1016.21  | 10.2968  | 9.6674   | 10.032  | 1092.39  | 745.396 | 11.8897 | 1094.46 | 13.5368 | 3.0565  | -225.428 | 279.07  |

## Data Visualization

Three programs that implementation of histogram, scatter plot and pair-plot library in python:

| Histogram.py                                  | scatter_plot.py                               |

|-----------------------------------------------|-----------------------------------------------|

| Generates the histogram of the features to see the homogeneous score distribution between all four houses. | Displays a scatter plot of similar features to identify those that can be eliminated. |

| ![Histogram Screenshot]() | ![Scatter Plot Screenshot]() |

| pair_plot.py                                                                                       |

|----------------------------------------------------------------------------------------------------|

| Displays a pair plot matrix of the data to identify features for the logistic regression model.  |

| ![Pair Plot Screenshot](https://github.com/shimazadeh/Ft_logistic_regression/assets/67879533/216e4d59-4d86-4aa2-87a3-cdbe3c3e80a7) |

## Training and Evaluation

The program is modular and can be run with different settings. Adjust the config.yml file with your speicfic parameters and feeatures. The program can be run in two different mode: training and testing:

- Training: you must provide models parameters, the dataset and features to do the trainings in the yml file

- Testing: this mode of the program uses the model.joblib file generated from the training phase and outputs the result in a json file. 

During training the loss of each category is printed in the terminal for each iteration. At the end of the training a confusion matrix with performance of each category is also generated in the terminal.

![Alt text]()

| Stochastic GD                                                     | Mini-Batch GD                                                     | GD                                                                |

|-------------------------------------------------------------------|-------------------------------------------------------------------|-------------------------------------------------------------------|

|![Alt text]()|![Alt text]()|![Alt text]()|