https://github.com/michaelzheng67/ml_classification_optimizer
Algorithm that determines best machine learning classification model to use for a given dataset. Written in Python.
https://github.com/michaelzheng67/ml_classification_optimizer
classification machine-learning python scikit-learn
Last synced: about 2 months ago
JSON representation
Algorithm that determines best machine learning classification model to use for a given dataset. Written in Python.
- Host: GitHub
- URL: https://github.com/michaelzheng67/ml_classification_optimizer
- Owner: michaelzheng67
- Created: 2021-05-07T01:56:11.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2021-05-08T20:21:49.000Z (about 5 years ago)
- Last Synced: 2025-04-07T20:19:54.400Z (about 1 year ago)
- Topics: classification, machine-learning, python, scikit-learn
- Language: Python
- Homepage:
- Size: 129 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Machine Learning Classification Optimizer
Python-based application
Imports: Pandas, Scikit-learn
tldr: Prints which machine learning classification model would work best for a given dataset.
Inspiration and tutorial based on Udemy Machine Learning course by Kirill Eremenko. This algorithm works by having the user insert a .csv file of data that can be grouped and classified, and runs it through multiple classification models, in which the best possible model for the dataset is determined by metric assessment. Firstly, the .py file is configured so that the user is directing it to connect to data within a given .csv file. Then, the data is split into training set and test set, undergoes feature scaling, and then is plugged into seven different classification models from scikit-learn. Then, the models are judged on multiple metrics also derived from scikit-learn.
Credit to the Machine Learning course for providing the test data and the foundational code for the basic way that the models can run and splitting / scaling the test data.
notes:
- The variables file import that the main.py file is referring to is another .py file that stores strings that the models use
- In order for the algorithm to work, we must ensure that the dependent variables are placed before the independent variable in terms of column order. This means that the independent variable in which the classification is trying to guess is going to be in the last column of the .csv file '
- The Social Network Ads test data was also provided by the Udemy course. Essentially, it's a csv dataset that has age and estimated salary columns, along with a last column of whether or not that specific user clicked on an ad