https://github.com/ehsan-ashik/prostate-cancer-classification
The project involves training micro RNA-based classification models using a prostate cancer dataset.
https://github.com/ehsan-ashik/prostate-cancer-classification
classification k-means-clustering knn-classifier pca prostate-cancer random-forest tsne
Last synced: about 1 year ago
JSON representation
The project involves training micro RNA-based classification models using a prostate cancer dataset.
- Host: GitHub
- URL: https://github.com/ehsan-ashik/prostate-cancer-classification
- Owner: ehsan-ashik
- Created: 2024-10-17T04:05:27.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-11-09T03:00:11.000Z (over 1 year ago)
- Last Synced: 2025-02-01T10:11:51.650Z (over 1 year ago)
- Topics: classification, k-means-clustering, knn-classifier, pca, prostate-cancer, random-forest, tsne
- Language: R
- Homepage:
- Size: 1.26 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# mRNA-based Prostate Cancer Classification
In this project, I train micro RNA-based classification models to accurately identify prostate cancer. The dataset used in the project is avaiable [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE112264).
Prostate cancer is one of the common cancer types. While some types of prostate cancer grow slowly and may not need only minimal or no treatment, some other types can be very aggressive and can grow quickly.
Prostate cancers that is detected early, has the best chance for successful treatment. However, the high false rate of prostate-specific antigens (PSA) may often lead to *negative prostate biopsies*, which does not definitively exclude the presence of cancer and often requires further investigation.
## Project Target
* Comparing *Prostate Cancer miRNA* and *healthy control miRNA*, which might help determining the divergences among the groups.
* Comparing *Negative Biopsy miRNA* to *healthy control miRNA*, to understand the deviation from normal miRNA in the Negative Biopsy miRNA.
* Work on training classifiers on detecting prostate cancer and negative prostate biopsies.
## Applied Methods:
* Performing dimension reduction techniques, e.g., *PCA* and *tSNE* - to understand divergences among groups in lower dimensions.
* Performing clustering methods, e.g., *kmeans* - to see whether clusters can be generated based on the data.
* Training classification models, using *k-NN* and *Random Forest* algorithms.
## Considerations for Classification
* Evaluation Metric: *Accuracy*
* Repeated cross validation - *10 fold 10 repeats each*
* Train/test split - *75%/25%*
* Hyperparameter tuning for the classification
* Imbalance resulation techniques, e.g., *up* and *down* sampling
## Model Performance
Best fitted trained model performed well for classification, with over *95%* accuracy.
## Project Takeaways:
* mRNA based classification works fairly well for spearting prostate cancer patients from negative biopsies that can improve early detection of prostate cancer, which is crucial for successful treatment.
* Accuracy of detecting negative biopsy patients from healthy individuals was also found very good, which may enable early detection and chance for close observation for further development of prostate cancer.