Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rakumar99/cancer-prediction-using-machine-learning-
This project uses PyCaret to classify cancer as malignant or benign based on cell nucleus features from fine needle aspirate (FNA) images. The best-performing model achieved an AUC score of 99.47%, ensuring accurate detection for early diagnosis.
https://github.com/rakumar99/cancer-prediction-using-machine-learning-
classification jupyter-notebook machine-learning pycaret python vscode
Last synced: about 1 month ago
JSON representation
This project uses PyCaret to classify cancer as malignant or benign based on cell nucleus features from fine needle aspirate (FNA) images. The best-performing model achieved an AUC score of 99.47%, ensuring accurate detection for early diagnosis.
- Host: GitHub
- URL: https://github.com/rakumar99/cancer-prediction-using-machine-learning-
- Owner: rakumar99
- Created: 2024-10-20T13:42:55.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-20T13:59:09.000Z (3 months ago)
- Last Synced: 2024-10-26T01:42:38.866Z (3 months ago)
- Topics: classification, jupyter-notebook, machine-learning, pycaret, python, vscode
- Language: Jupyter Notebook
- Homepage:
- Size: 464 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cancer-Prediction-Using-Machine-Learning
## Project Overview:
This project aims to classify breast cancer as either malignant or benign using machine learning algorithms. The classification models are built using PyCaret, an open-source, low-code machine learning library in Python. By leveraging various algorithms, the project identifies the best-performing model for accurate breast cancer detection based on cell nucleus features from a digitized image of a fine needle aspirate (FNA).
## Dataset Information:
The dataset consists of features computed from digitized images of fine needle aspirate samples from breast masses. These features describe the characteristics of the cell nuclei present in the image, allowing for accurate classification of the mass as either malignant or benign.
## Attribute Information:
ID Number: Unique identifier for each case.
Diagnosis: Indicates if the tumor is malignant (M) or benign (B).
Ten Real-Valued Features Computed for Each Cell Nucleus:Radius: Mean of distances from the center to points on the perimeter.
Texture: Standard deviation of gray-scale values.
Perimeter
Area
Smoothness: Local variation in radius lengths.
Compactness: (Perimeter^2 / Area) - 1.0.
Concavity: Severity of concave portions of the contour.
Concave Points: Number of concave portions of the contour.
Symmetry
Fractal Dimension: "Coastline approximation" - 1.
For each feature, the mean, standard error, and worst or largest values were computed, resulting in 30 total features per image.
## Class Distribution:
Benign (B): 357 cases
Malignant (M): 212 cases
Download Link: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data## Libraries Used:
The following Python libraries were utilized in the project for data analysis, visualization, and model building:pandas: For data manipulation and analysis.
matplotlib: For plotting graphs and visualizing data trends.
seaborn: For statistical data visualization.
## Machine Learning Algorithms:
Several machine learning classification algorithms were applied and compared using PyCaret to identify the most effective model for breast cancer detection:Logistic Regression
Decision Tree
Random Forest
Extra Trees
XGBoost
LightGBM
CatBoost
Best Model Performance
Best Model AUC: 99.47%
This high AUC score demonstrates the model’s excellent ability to distinguish between malignant and benign tumors, ensuring a reliable classification
## Future Work:
Hyperparameter Tuning: Fine-tune model parameters to further improve accuracy.
Additional Feature Engineering: Create new features to enhance the predictive power of the models.
Deep Learning: Explore the use of deep learning models for even higher accuracy.
## Conclusion:
This project successfully classifies breast cancer as malignant or benign using a variety of machine learning algorithms in PyCaret. With an AUC score of 99.47%, the model offers an effective solution for early detection and diagnosis of breast cancer, providing valuable insights into improving healthcare outcomes.