{"id":26567981,"url":"https://github.com/sivkri/geneexpression-machinelearning","last_synced_at":"2026-04-26T16:31:58.110Z","repository":{"id":279601302,"uuid":"939346178","full_name":"sivkri/GeneExpression-MachineLearning","owner":"sivkri","description":"Supervised Machine Learning for Gene Expression Analysis","archived":false,"fork":false,"pushed_at":"2025-02-26T12:33:24.000Z","size":3899,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-22T19:37:32.928Z","etag":null,"topics":["geneexpression","logistic-regression","random-forest-classifier","supervised-learning","supervised-machine-learning"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sivkri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-26T11:53:08.000Z","updated_at":"2025-03-04T11:31:49.000Z","dependencies_parsed_at":"2025-02-26T12:46:21.412Z","dependency_job_id":null,"html_url":"https://github.com/sivkri/GeneExpression-MachineLearning","commit_stats":null,"previous_names":["sivkri/geneexpression-machinelearning"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sivkri/GeneExpression-MachineLearning","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sivkri%2FGeneExpression-MachineLearning","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sivkri%2FGeneExpression-MachineLearning/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sivkri%2FGeneExpression-MachineLearning/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sivkri%2FGeneExpression-MachineLearning/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sivkri","download_url":"https://codeload.github.com/sivkri/GeneExpression-MachineLearning/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sivkri%2FGeneExpression-MachineLearning/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32305035,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T09:34:17.070Z","status":"ssl_error","status_checked_at":"2026-04-26T09:34:00.993Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["geneexpression","logistic-regression","random-forest-classifier","supervised-learning","supervised-machine-learning"],"created_at":"2025-03-22T19:29:27.640Z","updated_at":"2026-04-26T16:31:58.093Z","avatar_url":"https://github.com/sivkri.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Supervised Learning for Gene Expression Analysis\n\nThis repository showcases the application of **Logistic Regression** and **Random Forest** for gene expression analysis. It automates data processing, model training, and evaluation.\n\n## 🚀 Features\n- **Preprocessing**: Formats gene expression data\n- **Machine Learning**: Uses Logistic Regression \u0026 Random Forest\n- **Evaluation**: Generates accuracy reports and confusion matrices\n- **Automation**: Includes a shell script \u0026 GitHub Actions\n\n## 📂 Repository Structure\n```\n📂 **ml_gene_expression_project**  \n ┣ 📂 **data/**               → Stores gene expression data  \n ┣ 📂 **results/**            → Model reports and visualizations  \n ┣ 📜 **preprocess.py**       → Data processing script  \n ┣ 📜 **train.py**            → Model training script  \n ┣ 📜 **evaluate.py**         → Model evaluation with visualization  \n ┣ 📜 **run_pipeline.sh**     → Shell script to automate pipeline execution  \n ┣ 📜 **run_pipeline.yml**    → GitHub Actions workflow  \n ┣ 📜 **app.py**              → Streamlit app for interactive visualization  \n ┣ 📜 **README.md**           → Project documentation  \n```\n\n## 🏃 Run the Pipeline\n```bash\nbash run_pipeline.sh\n```\n\n## 🖼️ Sample Outputs\n- Model Accuracy Reports in `results/`\n- Confusion Matrices saved as images\n\n## 🤖 GitHub Actions\nThis repository supports **automatic execution** when new data is pushed.\n\n## 📌 How to Use\n1. Clone the repository:  \n   ```bash\n   git clone https://github.com/sivkri/GeneExpression-MachineLearning.git\n   ```\n2. Navigate to the directory:  \n   ```bash\n   cd GeneExpression-MachineLearning\n   ```\n3. Run the pipeline:  \n   ```bash\n   bash run_pipeline.sh\n   ```\n\n## To launch the Streamlit app for interactive visualization:\n\n   ```bash\n   streamlit run app.py\n   ```\n\n### 🔥 Key Enhancements:  \n✔ **Streamlit app is properly emphasized**  \n✔ **Instructions for running the app are added**  \n✔ **Clear explanation of app features**  \n\nThis version **sells** your **ML + Streamlit** project effectively. Let me know if you need further refinements! 🚀\n\n\n\n## Results \u0026 Findings\n- Identified top genes differentiating wild-type (WT) vs knockout (KO) conditions\n- Evaluated the impact of Eltrombopag (E20) treatment\n- Achieved 75% accuracy with the Random Forest classifier\n\n\n\n# Project Overview  \nThis project applies **machine learning** techniques to analyze **gene expression data** under different experimental conditions. Using **logistic regression** and **random forest classifiers**, we identify genes differentially expressed due to **HuR knockout (ELAVL1 deletion)** and **Eltrombopag (E20) drug treatment**.\n\nAdditionally, a **Streamlit web application** is integrated to provide an **interactive visualization** of the results.  \n\n## Citation  \nIf you use this dataset or findings, please cite the following study:  \n📖 **DOI:** [10.1186/s12915-025-02131-z](https://doi.org/10.1186/s12915-025-02131-z)\n\n---\n\n## **Experimental Design**  \n\nThis study investigates how **HuR knockout (KO)** and **Eltrombopag (E20) treatment** influence gene expression compared to wild-type (WT) and mock treatment (DMSO).\n\n### **Sample Groups**  \nThe dataset consists of the following experimental conditions:\n\n| Sample Group | Description |\n|-------------|-------------|\n| **WT-DMSO** | Wild-type (WT) cells treated with mock (DMSO) |\n| **WT-E20**  | Wild-type (WT) cells treated with Eltrombopag (E20) |\n| **KO-DMSO** | HuR knockout (KO) cells treated with mock (DMSO) |\n| **KO-E20**  | HuR knockout (KO) cells treated with Eltrombopag (E20) |\n\nEach sample contains gene expression data across thousands of genes. **HuR (ELAVL1)** is a key **RNA-binding protein**, and its knockout may significantly alter gene expression. **Eltrombopag** is a thrombopoietin receptor agonist that may influence transcriptional programs.\n\n---\n\n## **Comparisons \u0026 Research Questions**  \n\nI have performed **three key comparisons** using **supervised learning** to classify gene expression profiles.\n\n### **1️⃣ Effect of HuR Knockout (KO vs. WT)**\n- **Comparison:** **WT-DMSO vs. KO-DMSO**  \n- **Objective:** Identify genes affected by HuR deletion.  \n- **Machine Learning Approach:**  \n  - Features: Gene expression levels  \n  - Labels: WT-DMSO (class 0) vs. KO-DMSO (class 1)  \n\n### **2️⃣ Effect of Eltrombopag in Wild-Type Cells**\n- **Comparison:** **WT-DMSO vs. WT-E20**  \n- **Objective:** Determine gene expression changes due to Eltrombopag in normal cells.  \n- **Machine Learning Approach:**  \n  - Features: Gene expression levels  \n  - Labels: WT-DMSO (class 0) vs. WT-E20 (class 1)  \n\n### **3️⃣ Effect of Eltrombopag in HuR Knockout Cells**\n- **Comparison:** **KO-DMSO vs. KO-E20**  \n- **Objective:** Understand the **HuR-dependent** response to Eltrombopag.  \n- **Machine Learning Approach:**  \n  - Features: Gene expression levels  \n  - Labels: KO-DMSO (class 0) vs. KO-E20 (class 1)  \n\n---\n\n## **Data Processing \u0026 Machine Learning Workflow**  \n\n1. **Preprocessing:**  \n   - Normalize expression data  \n   - Convert into a machine-learning-ready format  \n\n2. **Model Training \u0026 Feature Selection:**  \n   - Train **logistic regression** and **random forest** classifiers  \n   - Perform **Principal Component Analysis (PCA)**  \n\n3. **Evaluation:**  \n   - Compute **accuracy, confusion matrices, classification reports**  \n   - Identify **top differentially expressed genes**  \n\n4. **Visualization \u0026 Reporting:**  \n   - Generate **PCA scatter plots**  \n   - Save **model performance metrics**  \n\n---\n\n## 📧 Contact\nFor queries, feel free to reach out! 🚀\n\n---\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsivkri%2Fgeneexpression-machinelearning","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsivkri%2Fgeneexpression-machinelearning","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsivkri%2Fgeneexpression-machinelearning/lists"}