https://github.com/zenitsu0509/machine-learning
This repo have Machine learning algo projects that i have made.
https://github.com/zenitsu0509/machine-learning
database machine-learning
Last synced: 9 months ago
JSON representation
This repo have Machine learning algo projects that i have made.
- Host: GitHub
- URL: https://github.com/zenitsu0509/machine-learning
- Owner: zenitsu0509
- Created: 2024-07-10T05:30:23.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-09T13:59:10.000Z (over 1 year ago)
- Last Synced: 2025-03-28T14:22:49.032Z (over 1 year ago)
- Topics: database, machine-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 1.84 MB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Machine Learning Algorithms Project
Overview
This project explores various machine learning algorithms to solve classification and regression problems. The algorithms implemented include:
- AdaBoost
- Decision Tree Classifier
- K-Nearest Neighbors (KNN)
- Linear Regression
- Logistic Regression
- Naive Bayes
- Random Forest
Table of Contents
- Introduction
- Datasets
-
Algorithms
- AdaBoost
- Decision Tree Classifier
- K-Nearest Neighbors
- Linear Regression
- Logistic Regression
- Naive Bayes
- Random Forest
- Usage
- Results
- Conclusion
- References
Introduction
The aim of this project is to implement and compare various machine learning algorithms on different datasets to evaluate their performance. Each algorithm has been tested on the same datasets to ensure a fair comparison.
Datasets
The datasets used in this project are included in the data directory. They cover a variety of domains to test the versatility and robustness of the algorithms.
Algorithms
AdaBoost
AdaBoost (Adaptive Boosting) is an ensemble learning method that combines multiple weak classifiers to create a strong classifier. It adjusts the weights of incorrectly classified instances to improve performance.
Decision Tree Classifier
A Decision Tree is a non-parametric supervised learning method used for classification and regression. It splits the data into subsets based on the value of input features.
K-Nearest Neighbors
K-Nearest Neighbors (KNN) is a simple, instance-based learning algorithm used for classification and regression. It classifies instances based on the majority label of their nearest neighbors.
Linear Regression
Linear Regression is a regression algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation.
Logistic Regression
Logistic Regression is a classification algorithm used to model the probability of a certain class or event. It is particularly useful for binary classification problems.
Naive Bayes
Naive Bayes is a probabilistic classifier based on Bayes' theorem with the assumption of independence between features. It is particularly effective for text classification problems.
Random Forest
Random Forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes for classification or mean prediction for regression.
Usage
To run the algorithms, use the following command:
python main.py
You can modify the datasets and parameters in the config.py file.
Results
The results of the algorithms are saved in the results directory. Each algorithm's performance is evaluated based on metrics such as accuracy, precision, recall, F1-score for classification, and mean squared error (MSE) for regression.
Conclusion
This project demonstrates the implementation and comparison of various machine learning algorithms. The performance of each algorithm varies depending on the dataset and problem type. Future work could include exploring more advanced algorithms and techniques to improve performance further.
References
-
AdaBoost: Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139. -
Decision Trees: Breiman, L. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth. -
KNN: Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. -
Linear Regression: Seber, G. A. F., & Lee, A. J. (2012). Linear Regression Analysis (2nd ed.). John Wiley & Sons. -
Logistic Regression: Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society: Series B (Methodological), 20(2), 215-242. -
Naive Bayes: Maron, M. E. (1961). Automatic Indexing: An Experimental Inquiry. Journal of the ACM, 8(3), 404-417. -
Random Forest: Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.