https://github.com/zenitsu0509/machine-learning

The aim of this project is to implement and compare various machine learning algorithms on different datasets to evaluate their performance. Each algorithm has been tested on the same datasets to ensure a fair comparison.

Datasets

The datasets used in this project are included in the data directory. They cover a variety of domains to test the versatility and robustness of the algorithms.

Algorithms

AdaBoost

AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak classifiers to create a strong classifier. It adjusts the weights of incorrectly classified instances to improve performance.

Decision Tree Classifier

A Decision Tree is a non-parametric supervised learning method used for classification and regression. It splits the data into subsets based on the value of input features.

K-Nearest Neighbors

K-Nearest Neighbors (KNN) is a simple, instance-based learning algorithm used for classification and regression. It classifies instances based on the majority label of their nearest neighbors.

Linear Regression

Linear Regression is a regression algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation.

Logistic Regression

Logistic Regression is a classification algorithm used to model the probability of a certain class or event. It is particularly useful for binary classification problems.

Naive Bayes

Naive Bayes is a probabilistic classifier based on Bayes' theorem with the assumption of independence between features. It is particularly effective for text classification problems.

Random Forest

Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes for classification or mean prediction for regression.

Usage

To run the algorithms, use the following command:

python main.py

You can modify the datasets and parameters in the config.py file.

Results

The results of the algorithms are saved in the results directory. Each algorithm's performance is evaluated based on metrics such as accuracy, precision, recall, F1-score for classification, and mean squared error (MSE) for regression.

Conclusion

This project demonstrates the implementation and comparison of various machine learning algorithms. The performance of each algorithm varies depending on the dataset and problem type. Future work could include exploring more advanced algorithms and techniques to improve performance further.