https://github.com/giatraskon/gastric-cancer-scrnaseq-analysis
Recreation and enrichment of the gastric (GC) cancer single-cell RNA-seq (scRNA-seq) data analysis pipeline described in the "Comprehensive analysis of metastatic gastric cancer tumour cells using single‑cell RNA‑seq" by Wang B. et. al, using the raw counts matrix they provide.
https://github.com/giatraskon/gastric-cancer-scrnaseq-analysis
average-linkage bioinformatics clustering gastric-cancer gmm-clustering go-annotation leiden-algorithm louvain-algorithm machine-learning marker-gene-identification pca python sc-rna-seq sc-rna-seq-analysis scanpy single-cell-rna-seq spectral-clustering tsne umap ward-linkage
Last synced: 28 days ago
JSON representation
Recreation and enrichment of the gastric (GC) cancer single-cell RNA-seq (scRNA-seq) data analysis pipeline described in the "Comprehensive analysis of metastatic gastric cancer tumour cells using single‑cell RNA‑seq" by Wang B. et. al, using the raw counts matrix they provide.
- Host: GitHub
- URL: https://github.com/giatraskon/gastric-cancer-scrnaseq-analysis
- Owner: GiatrasKon
- License: mit
- Created: 2024-07-02T15:31:29.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-11-25T16:42:23.000Z (5 months ago)
- Last Synced: 2025-02-06T20:44:12.544Z (3 months ago)
- Topics: average-linkage, bioinformatics, clustering, gastric-cancer, gmm-clustering, go-annotation, leiden-algorithm, louvain-algorithm, machine-learning, marker-gene-identification, pca, python, sc-rna-seq, sc-rna-seq-analysis, scanpy, single-cell-rna-seq, spectral-clustering, tsne, umap, ward-linkage
- Language: Jupyter Notebook
- Homepage:
- Size: 19.1 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Gastric Cancer Single-Cell RNA-seq Analysis
This repository contains the code and data for the recreation and enrichment of the gastric (GC) cancer single-cell RNA-seq (scRNA-seq) data analysis pipeline described in the ["Comprehensive analysis of metastatic gastric cancer tumour cells using single‑cell RNA‑seq" by Wang B. et. al](https://www.nature.com/articles/s41598-020-80881-2), using the raw counts matrix that is available in [GEO (GSE158631)](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158631). Our robust and comprehensive pipeline surpasses previous approaches by incorporating multiple dimensionality reduction techniques, various clustering methods, marker gene identification and GO functional annotation.
This analysis was performed as part of the final project for the "Machine Learning in Computational Biology" graduate course of the MSc Data Science & Information Technologies Master's programme (Bioinformatics - Biomedical Data Science Specialization) of the Department of Informatics and Telecommunications department of the National and Kapodistrian University of Athens (NKUA), under the supervision of professor Elias Manolakos, in the academic year 2023-2024.
## Contributors
- [Konstantinos Giatras](https://github.com/GiatrasKon)
- [Olympia Tsiomou](https://github.com/otsiomou)## Cloning the Repository
```sh
git clone https://github.com/GiatrasKon/Gastric-Cancer-scRNAseq-Analysis.git
```## Package Dependencies
Ensure you have the following packages installed:
- pandas
- IPython
- matplotlib
- seaborn
- numpy
- anndata
- scanpy
- scipy
- sklearn
- umap-learn
- gprofiler-officialInstall dependencies using:
```sh
pip install pandas ipython matplotlib seaborn numpy anndata scanpy scipy scikit-learn umap-learn gprofiler-official
```## Class Description
The `GCSingleCellAnalysis` class (located in `src/codebase.py`) provides a comprehensive workflow for preprocessing, dimensionality reduction, clustering, and functional annotation of single-cell RNA-seq data.
## Running the Analysis
1. **Preprocessing and Normalization:**
```python
from src.codebase import GCSingleCellAnalysis
analysis = GCSingleCellAnalysis('data/GSE158631_count.csv')
analysis.preprocess_adata()
analysis.normalize_adata()
analysis.filter_adata()
analysis.prepare_adata()
```2. **Dimensionality Reduction:**
```python
analysis.perform_pca()
analysis.prepare_pca_reduced_adata()
analysis.plot_pca()
analysis.perform_tsne()
analysis.plot_tsne()
analysis.perform_umap()
analysis.plot_umap()
```3. **Clustering:**
```python
methods = ['gmm', 'average_link', 'ward', 'spectral', 'louvain', 'leiden']
results_pca = analysis.cluster_and_evaluate(methods, embeddings=['X_pca'])
results_tsne = analysis.cluster_and_evaluate(methods, embeddings=['X_tsne'])
results_umap = analysis.cluster_and_evaluate(methods, embeddings=['X_umap'])
combined_results = {'PCA': results_pca, 't-SNE': results_tsne, 'UMAP': results_umap}
results_df = analysis.create_results_dataframe(combined_results)
analysis.plot_clustering_evaluation(results_df)
```4. **Post-Clustering Analysis:**
```python
analysis.analyze_and_plot_markers(group_key='average_link_X_umap', n_genes=10)
go_annotations = analysis.fetch_go_annotations(group_key='average_link_X_umap', n_genes=10)
analysis.print_go_annotations(go_annotations)
```## Notebook
For a detailed step-by-step analysis, refer to the Jupyter notebook:
- `notebooks/GC_scRNAseq_data_analysis.ipynb`Images of the results of the analysis can be found in the `images` directory.
## Documentation
Refer to the `documents` directory for the project proposal, presentation, report, and the original authors' paper.
---