https://github.com/rohitinu6/neolung
Lung Cancer Prediction using Machine Learning Algorithms
https://github.com/rohitinu6/neolung
adaboost data-analysis decision-trees gradientboosting knn logistic-regression machine-learning naivebayes neuralnetworks python randomforest scikit-learn svm xgboost
Last synced: about 1 month ago
JSON representation
Lung Cancer Prediction using Machine Learning Algorithms
- Host: GitHub
- URL: https://github.com/rohitinu6/neolung
- Owner: rohitinu6
- Created: 2023-06-14T10:57:17.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-18T08:56:05.000Z (8 months ago)
- Last Synced: 2025-04-14T13:54:47.223Z (6 months ago)
- Topics: adaboost, data-analysis, decision-trees, gradientboosting, knn, logistic-regression, machine-learning, naivebayes, neuralnetworks, python, randomforest, scikit-learn, svm, xgboost
- Language: Jupyter Notebook
- Homepage:
- Size: 446 KB
- Stars: 13
- Watchers: 2
- Forks: 4
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NeoLung: Lung cancer prediction using machine learning
## Aim:
The purpose of this project is to comapare Classification algorithms implemented on Lung Cancer Dataset
## Dataset:
The Lung cancer dataset used in the project has been collected from data.world whose link is:
https://data.world/sta427ceyin/survey-lung-cancer
## Working:
We have selected **10 of the following classification algorithms** that have been used in this project:
1. Logistic Regression
2. K-Nearest Neighbors (KNN)
3. Decision Tree
4. Support Vector Machines (SVM)
5. Naive Bayes
6. Random Forest
7. Gradient Boosting
8. Neural Networks
9. AdaBoost
10. XGBoost
Then we build the model for each of the above mentioned algorithms. Using the following **Evaluation Metrics** we have compared the algorithms:
1. Accuracy
2. Precision
3. F1 Score
4. Recall Score
5. Confusion Matrix
These are the accuracies of the algorithms:
1. Logistic Regression: **90.29%**
2. K-Nearest Neighbors (KNN): **87.37%**
3. Decision Tree: **87.37%**
4. Support Vector Machines (SVM): **84.46%**
5. Naive Bayes: **86.4%**
6. Random Forest: **89.32%**
7. Gradient Boosting: **89.32%**
8. Neural Networks: **84.46%**
9. AdaBoost: **84.46%**
10. XGBoost: **84.46%**
## Results:
Out of all the algorithms so implemented, **Logistic Regression** performed the best. The evaluation metrics for Logistic Regression is as follows:
**Accuracy: 0.9029126213592233**
**Precision: 0.9052631578947369**
**Recall: 0.9885057471264368**
**F1 score: 0.945054945054945**
**Confusion Matrix:**
