https://github.com/demic-dev/als-biomarker-identification-project
ALS Project aims to identify potential biomarkers associated with Amyotrophic Lateral Sclerosis (ALS) through RNA-Seq data analysis. w/ @aliakseibrown
https://github.com/demic-dev/als-biomarker-identification-project
bioinformatics data-science statistics
Last synced: about 1 year ago
JSON representation
ALS Project aims to identify potential biomarkers associated with Amyotrophic Lateral Sclerosis (ALS) through RNA-Seq data analysis. w/ @aliakseibrown
- Host: GitHub
- URL: https://github.com/demic-dev/als-biomarker-identification-project
- Owner: demic-dev
- Created: 2024-04-02T18:12:50.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-06-05T17:02:33.000Z (about 2 years ago)
- Last Synced: 2025-04-12T00:50:00.454Z (about 1 year ago)
- Topics: bioinformatics, data-science, statistics
- Language: Jupyter Notebook
- Homepage:
- Size: 32.5 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# **ALS Biomarker Identification Project**
Welcome to the ALS Biomarker Identification Project repository. This project, conducted in collaboration with [Aliaksei](https://github.com/aliakseibrown) at Paris Saclay University, aims to identify potential biomarkers associated with Amyotrophic Lateral Sclerosis (ALS) through RNA-Seq data analysis.
## **Objective**
The primary objective is to use advanced computational techniques to identify genes with significant expression differences between ALS patients and a Non-Neurological control group. By identifying these biomarkers, we aim to contribute to early diagnosis and targeted interventions for ALS.
## **Key Highlights**
- **Data Preprocessing**: Parsed RNA-Seq data into DataFrame, split samples, and organized data using an object-oriented approach.
- **Descriptive Analysis**: Analyzed and visualized the distribution of gene samples, highlighting differences between ALS and control groups.
- **Dimensionality Reduction Techniques**: Used Principal Component Analysis (PCA) and t-distributed Stochastic Neighbor Embedding (tSNE) to visualize complex patterns and clusters.
- **Advanced Analysis Techniques**: Conducted univariate analyses and used the PyDESeq2 library to identify genes with significant expression differences.
- **Model Tuning and Generalization**: Applied normalization and ElasticNet model tuning to accurately identify ALS-impacted genes.
## **Notion Report**
[Analysis of Amyotrophic Lateral Sclerosis RNA-Seq](https://aliakseibrown.notion.site/Analysis-of-Amyotrophic-Lateral-Sclerosis-RNA-Seq-43489d7f7f584083867c22cb695d8419?pvs=4)
## **Getting Started**
To get started with this project:
1. Clone this repository to your local machine.
```shell
git clone https://github.com/demic-dev/als-biomarker-identification-project.git
```
2. Create a conda environment after opening the repository
```
conda create -n bio python=3.9 -y
conda activate bio
```
3. Install the necessary dependencies listed in **`requirements.txt`**.
```shell
pip install -r requirements.txt
```
4. Explore the Jupyter notebooks in the **`analysis`** directory to understand our methodology and findings.
## **Acknowledgments**
We thank Paris-Saclay University for their resources and support. We also appreciate the guidance and expertise of our mentor and collaborators.
## **Sources**
The data was provided by a research Postmortem Cortex Samples Identify Distinct Molecular Subtypes of ALS: Retrotransposon Activation, Oxidative Stress, and Activated Glia