Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/fpt-thaituan/applying-machine-learning-to-predict-cancer-patients

This project uses machine learning algorithms to predict cancer based on patient data. By training on features like tumor size and biomarkers, it aims to improve diagnostic accuracy, assist in early detection, and explore automated cancer screening systems for efficiency.
https://github.com/fpt-thaituan/applying-machine-learning-to-predict-cancer-patients

Last synced: 21 days ago
JSON representation

This project uses machine learning algorithms to predict cancer based on patient data. By training on features like tumor size and biomarkers, it aims to improve diagnostic accuracy, assist in early detection, and explore automated cancer screening systems for efficiency.

Awesome Lists containing this project

README

        

#### **Abstract : This project involves utilizing a cancer patient dataset to apply machine learning algorithms in predicting cancer. Machine learning offers powerful tools for identifying patterns within complex medical data, allowing for early and accurate cancer diagnoses. Key algorithms, such as Decision Trees, Support Vector Machines, K-Nearest Neighbors, and Neural Networks, are implemented to evaluate their effectiveness in distinguishing between cancerous and non-cancerous cases. By training these models on relevant features like tumor size, cell structure, and other biomarkers, the study aims to improve the accuracy and reliability of cancer prediction. The results from different models are compared to identify the most effective approach, helping medical professionals make more informed decisions and offering potential insights into automated cancer screening systems.**

#### **Note : If the code doesn't run, note down the paths to the files because I'm changing them.**

#### **Project construction**
#### **1. Read the File**
#### **2. Data Preprocessing**
##### **2.1. Check the Na value in each column**
##### **2.2. Remove unnecessary features**
#### **3. Data visualization**
##### **3.1. Chart all data**

![Chart all data](https://github.com/user-attachments/assets/181734ab-8cdd-4ffa-9c9b-d4d885b1dbc9)

##### **3.2. Age Distribution Chart**

![Chart 1](https://github.com/user-attachments/assets/f02c2912-ded5-4203-a82f-c62238688c41)

##### **3.3. Sex Distribution Chart**

![Chart 2](https://github.com/user-attachments/assets/d4a2bad0-f6bd-44fe-a114-10b7a81b2b00)

##### **3.4. Boxplot chart of the relationship between Air Pollution and Level**

![Chart 3](https://github.com/user-attachments/assets/567d021e-d6c4-48be-a74a-05709db0ad3a)

#### **4. Train-Test Split**
#### **5. Model training**
##### **5.1. LogisticRegression Model**
##### **5.2. Random Forest Model**
##### **5.3. Support vector machine Model**
##### **5.4. XGBoost Model**
##### **5.5. Decision Tree Model**
##### **5.6. Gradient Boosting Model**
##### **5.7. K-Nearest Model**
##### **5.8. Naive Bayes Model**

#### **6. Conclude**

![Summary chart of model accuracy](https://github.com/user-attachments/assets/206f7c36-acc3-40d9-a607-91a61818694e)

#### **Conclusion : In this study, multiple machine learning models were applied to predict cancer using a patient dataset, yielding varied accuracy results. The Naive Bayes model achieved a solid accuracy of 89.5%, showing potential for cancer prediction but with some limitations in precision. The K-Nearest Neighbors model performed exceptionally well, with an accuracy of 99.5%, indicating its strong ability to distinguish cancerous cases. Models such as Gradient Boosting, Decision Tree, XGBoost, Support Vector Machine, Random Forest, and Logistic Regression achieved perfect accuracy scores of 100%, demonstrating their robustness and high effectiveness for this task. These findings highlight the strength of ensemble and advanced models in cancer prediction, suggesting they could be reliable tools for clinical applications. However, further testing with larger and more diverse datasets is necessary to confirm consistency and applicability in real-world scenarios.**

#### **Hope this article can help you.**
#### **If you have any questions please contact me for help!**
#### **Gmail: [email protected]**

#### ***Thanks everyone!***