https://github.com/aallali/dslr--data-science-x-logistic-regression

Discover Data Science in the projects where you re-constitute Poudlard’s Sorting Hat. Warning: this is not a subject on cameras.
https://github.com/aallali/dslr--data-science-x-logistic-regression

Last synced: 2 months ago
JSON representation

Discover Data Science in the projects where you re-constitute Poudlard’s Sorting Hat. Warning: this is not a subject on cameras.

Host: GitHub
URL: https://github.com/aallali/dslr--data-science-x-logistic-regression
Owner: aallali
Created: 2021-06-21T21:03:41.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2021-10-17T21:59:12.000Z (over 3 years ago)
Last Synced: 2025-03-24T20:45:13.443Z (2 months ago)
Language: Python
Homepage:
Size: 1.44 MB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# DSLR (Data Science X Logistic Regression)

## About

➜ Let's discover Data Science through this project in the reconstruction of the Hogwarts Magic Hat!

This is the second project of the Artificial Intelligence branch at School 1337 Khouribga

## Math Behind Logistic Regression

#### Feature Scaling:

#### Logistic Regression:

## Installation

Run `python3 -m pip install -r requirements.txt`

## Usage

`python3 describe.py datasets/dataset_train.csv`

- Display informations on all numerical features / and non-numerical as bonus

`python3 histogram.py datasets/dataset_train.csv`

- Answer the following question: Which Hogwarts course has a homogeneous score distribution between the four houses ?

`python3 scatter_plot.py datasets/dataset_train.csv`

- Answer the following question: What are the two features that are similar ?

`python3 pair_plot.py datasets/dataset_train.csv`

- Show a pair_plot of all numerical features present in `datasets/dataset_train.csv`

`python3 logreg_train.py [-h] [-v] datasets/dataset_train.csv`

- -h: Show help message and exit
- -v: Show the Cost of each classifier in graph
- Train a model from an the input dataset

`python3 logreg_predict.py datasets/dataset_test.csv weights`

- Generate a file (houses.csv) of predictions for the given dataset

### Example

`python3 src/describe.py datasets/dataset_train.csv`

`python3 src/histogram.py datasets/dataset_train.csv`

`python3 src/histogram.py datasets/dataset_train.csv -all`

`python3 src/scatter_plot.py datasets/dataset_train.csv`

`python3 src/pair_plot.py datasets/dataset_train.csv`

First we need to train our model with the provided dataset

```
➜ py src/logreg_train.py datasets/dataset_train.csv
Training against : Gryffindor
Training against : Hufflepuff
Training against : Ravenclaw
Training against : Slytherin
Weights saved in 'weights.npy',
Accuracy : 98.49 %
```

This will save the weights trained in `./weights.npy`

Then, to make predictions for a given dataset

`python3 src/logreg_predict.py datasets/dataset_test.csv weights.csv`

This will generate a file with all predictions in `./houses.csv`

```
➜ cat houses.csv
Index,Hogwarts House
...
```

run the evalution.py script (given by the 42):

script given by 42 that compare the results in houses.csv generated by ur prediction script with the datasets_truth.csv that contains real correct values (given by the 42 correction page) :

```
➜ py evaluate.py
Your score on test set: 0.988
Good job! Mc Gonagall congratulates you.
```

##### Project done in 2021/06/26

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aallali/dslr--data-science-x-logistic-regression

Awesome Lists containing this project

README