https://github.com/aallali/dslr--data-science-x-logistic-regression
Discover Data Science in the projects where you re-constitute Poudlard’s Sorting Hat. Warning: this is not a subject on cameras.
https://github.com/aallali/dslr--data-science-x-logistic-regression
Last synced: 2 months ago
JSON representation
Discover Data Science in the projects where you re-constitute Poudlard’s Sorting Hat. Warning: this is not a subject on cameras.
- Host: GitHub
- URL: https://github.com/aallali/dslr--data-science-x-logistic-regression
- Owner: aallali
- Created: 2021-06-21T21:03:41.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2021-10-17T21:59:12.000Z (over 3 years ago)
- Last Synced: 2025-03-24T20:45:13.443Z (2 months ago)
- Language: Python
- Homepage:
- Size: 1.44 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# DSLR (Data Science X Logistic Regression)
## About
➜ Let's discover Data Science through this project in the reconstruction of the Hogwarts Magic Hat!
This is the second project of the Artificial Intelligence branch at School 1337 Khouribga
## Math Behind Logistic Regression
#### Feature Scaling:
#### Logistic Regression:
![]()
![]()
![]()
## Installation
Run `python3 -m pip install -r requirements.txt`
## Usage
`python3 describe.py datasets/dataset_train.csv`
- Display informations on all numerical features / and non-numerical as bonus
`python3 histogram.py datasets/dataset_train.csv`
- Answer the following question: Which Hogwarts course has a homogeneous score distribution between the four houses ?
`python3 scatter_plot.py datasets/dataset_train.csv`
- Answer the following question: What are the two features that are similar ?
`python3 pair_plot.py datasets/dataset_train.csv`
- Show a pair_plot of all numerical features present in `datasets/dataset_train.csv`
`python3 logreg_train.py [-h] [-v] datasets/dataset_train.csv`
- -h: Show help message and exit
- -v: Show the Cost of each classifier in graph
- Train a model from an the input dataset`python3 logreg_predict.py datasets/dataset_test.csv weights`
- Generate a file (houses.csv) of predictions for the given dataset
### Example
`python3 src/describe.py datasets/dataset_train.csv`
![]()
`python3 src/histogram.py datasets/dataset_train.csv`
![]()
`python3 src/histogram.py datasets/dataset_train.csv -all`
![]()
`python3 src/scatter_plot.py datasets/dataset_train.csv`
![]()
`python3 src/pair_plot.py datasets/dataset_train.csv`
![]()
First we need to train our model with the provided dataset
```
➜ py src/logreg_train.py datasets/dataset_train.csv
Training against : Gryffindor
Training against : Hufflepuff
Training against : Ravenclaw
Training against : Slytherin
Weights saved in 'weights.npy',
Accuracy : 98.49 %
```This will save the weights trained in `./weights.npy`
Then, to make predictions for a given dataset
`python3 src/logreg_predict.py datasets/dataset_test.csv weights.csv`
This will generate a file with all predictions in `./houses.csv`
```
➜ cat houses.csv
Index,Hogwarts House
...
```run the evalution.py script (given by the 42):
script given by 42 that compare the results in houses.csv generated by ur prediction script with the datasets_truth.csv that contains real correct values (given by the 42 correction page) :
```
➜ py evaluate.py
Your score on test set: 0.988
Good job! Mc Gonagall congratulates you.
```##### Project done in 2021/06/26