https://github.com/multiomics-analytics-group/course_multi-omics_data_science

Last synced: 3 months ago
JSON representation
Host: GitHub
URL: https://github.com/multiomics-analytics-group/course_multi-omics_data_science
Owner: Multiomics-Analytics-Group
Created: 2025-11-03T09:32:11.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-11-14T13:02:29.000Z (7 months ago)
Last Synced: 2025-12-26T17:18:37.179Z (6 months ago)
Language: Jupyter Notebook
Size: 49.1 MB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # 🧬 Omics Data Analysis: A Computational Science Workshop

The **omics technology revolution** has generated **massive volumes of biological data** that require **differential analysis** for correct interpretation. These approaches necessitate the implementation of **computational tools and methodologies** to derive biological meaning from this type of data across various biological contexts.

---

This **32-hour intensive course** offers practical and up-to-date training in **data science applied to the analysis of omics data**, such as **metagenomics, transcriptomics, and proteomics**.

The primary objective is to **train researchers, bioinformaticians, and professionals** in the biological and health sciences in the management of omics data analysis tools and methodologies to **extract meaningful information**. Through interactive lectures, practical exercises, and the use of **real data**, participants will develop skills to **explore, visualize, and interpret omics data**, in addition to applying **biological network models** to address questions concerning the data utilized.

---

The course is structured over **four days**, beginning with an introduction to the **fundamentals of data science** and the **particular characteristics of omics data**. Topics covered will include processing techniques, analysis of **metagenomic, transcriptomic, and proteomic data**, visualization, functional enrichment, and **multi-omics integration**. A specific module will be dedicated to the application, analysis, and visualization of **biological networks** derived from the utilized and analyzed data.

The course culminates with a module where participants will **apply everything learned to a real-world case study**, working with **public data**. The practical sessions will be conducted in **Python**, utilizing **Jupyter Notebooks** and other visualization tools.

## Keywords

Computational microbiology, networks, databases, Python, programming, data, pipelines, data science.

## Syllabus

|Time| DAY 1| DAY 2| DAY 3| DAY 4|

|-----|-----|-----|-----|-----|

|8:00-8:45|[Introduction and Housekeeping](https://docs.google.com/presentation/d/1Fwiw9FxgTcgTta2O5CALWousFv_vrJMHNs5k1ay163Y/edit?usp=sharing)|[Introduction to Python I](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/01_Introduction_to_Python/01_basics.ipynb)|[Introduction to Networks in Python](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/05_Visualising_Networks/03_nx.ipynb)|[Analysing Networks I](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/05_Visualising_Networks/04_nxpandas.ipynb)|

| 8:45-9:30|[From Omics to Multi-omics](https://docs.google.com/presentation/d/1ZU5wpnlEanIw0I-tX-U9MXesf0ddkbjAp56VHWiIlyk/edit?usp=sharing)|[Introduction to Python II](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/01_Introduction_to_Python/01_basics.ipynb)|[Introduction to Networks in Python](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/05_Visualising_Networks/03_nx.ipynb)|[Analysing Networks II](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/05_Visualising_Networks/04_nxpandas.ipynb)|

|9:30-10:00| Coffee break| Coffee break | Coffee break| Coffee break|

|10:00-10:45| [Introduction to Networks](https://docs.google.com/presentation/d/1bBZNgRdD4P1Vnz5o4zuQNtbRycr8O7aCkV2h2Z-C_C4/edit?usp=sharing) | [Working with Data in Python I](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/02_Working_with_Data_in_Python/02_pandas.ipynb) | [Visualising Networks -- Cytoscape]()|[Invited Speaker - Professor Carlos Muskus](https://scholar.google.com/citations?user=Yp6lnXQAAAAJ&hl=es)|

| 10:45-11:30 | [Open Science](https://docs.google.com/presentation/d/1JkXb7SxWXlNDrHRC9eCPBJBeiaSbE_-lZfm8YqGoz-Y/edit?usp=sharing)| [Working with Data in Python II](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/02_Working_with_Data_in_Python/02_pandas.ipynb) | [Visualising Networks -- Cytoscape]() | [Multi-omics](https://docs.google.com/presentation/d/1xbuNIp87tWDQmaQDzW9EzzN6lJVNWfsY3wbTByngbEI/edit?usp=sharing)|

|11:30-13:00| Lunch| Lunch| Lunch|Lunch|

|13:00-13:45| [Standardising Omics Workflows with Nextflow](https://docs.google.com/presentation/d/1Yb4V7lbIZXXZOUu0aVemfjxe3syo4sAugan0b_IwKBA/edit?usp=sharing) |[Visualizing Data in Python I](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/04_Visualizing_Data_in_Python/05_viz.ipynb)| [Omics: Transcriptomics](https://docs.google.com/presentation/d/1exEbpdl9zMOdyXmbnlmT8jyx7xXXWfHs-Hp-05DxuZU/edit?usp=sharing)| [Multi-omics I](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/multiomics/notebooks/multiomics.ipynb)|

|13:45-14:30| [Omics: Metagenomics](https://docs.google.com/presentation/d/1dyHAOQjcV5ryp7g8LniHi1b38YwH3EfSzvM9Jy3Ej1w/edit?usp=sharing)| [Visualizing Data in Python II](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/notebooks/04_Visualizing_Data_in_Python/05_viz.ipynb)|[Preprocessing Transcriptomics with nf-core/RNAseq](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/transcriptomics/notebooks/transcriptomics_preprocessing.ipynb)| [Multi-omics II](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/multiomics/notebooks/multiomics.ipynb)|

|14:30-15:00| Coffee break| Coffee break| Coffee break| Coffee break|

|15:00-16:00| [Preprocessing Metagenomics with nf-core/Taxprofiler](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/metagenomics/notebooks/metagenomics_preprocessing.ipynb)|[Metagenomics Basic Analysis I](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/metagenomics/notebooks/metagenomics_analysis.ipynb)|[Transcriptomics Basic Analysis I](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/transcriptomics/notebooks/transcriptomics_analysis.ipynb)| Recap and Q&A

|16:00-17:00| Recap and Q&A| [Metagenomics Basic Analysis II](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/metagenomics/notebooks/metagenomics_analysis.ipynb)|[Transcriptomics Basic Analysis II](https://colab.research.google.com/github/Multiomics-Analytics-Group/course_multi-omics_data_science/blob/main/transcriptomics/notebooks/transcriptomics_analysis.ipynb)|Recap and Q&A|

## Further Resources

### References

1) [Empowering bioinformatics communities with Nextflow and nf-core](https://pubmed.ncbi.nlm.nih.gov/40731283/) *Björn E Langer, Andreia Amaral, Marie-Odile Baudement, Franziska Bonath, Mathieu Charles, Praveen Krishna Chitneedi, Emily L Clark, Paolo Di Tommaso, Sarah Djebali, Philip A Ewels, Sonia Eynard, James A Fellows Yates, Daniel Fischer, Evan W Floden, Sylvain Foissac, Gisela Gabernet, Maxime U Garcia, Gareth Gillard, Manu Kumar Gundappa, Cervin Guyomar, Christopher Hakkaart, Friederike Hanssen, Peter W Harrison, Matthias Hörtenhuber, Cyril Kurylo, Christa Kühn, Sandrine Lagarrigue, Delphine Lallias, Daniel J Macqueen, Edmund Miller, Júlia Mir-Pedrol, Gabriel Costa Monteiro Moreira, Sven Nahnsen, Harshil Patel, Alexander Peltzer, Frederique Pitel, Yuliaxis Ramayo-Caldas, Marcel da Câmara Ribeiro-Dantas, Dominique Rocha, Mazdak Salavati, Alexey Sokolov, Jose Espinosa-Carrasco, Cedric Notredame, The Nf-Core Community* [resource]()

2) [nf-core/taxprofiler](https://www.biorxiv.org/content/10.1101/2023.10.20.563221v1) *Sofia Stamouli,  Moritz E. Beber, Tanja Normark,  Thomas A. Christensen II,  Lili Andersson-Li,  Maxime Borry, Mahwash Jamy, nf-core community,  James A. Fellows Yates* [resource](https://nf-co.re/taxprofiler/)

3) [nf-core/rnaseq](https://www.nature.com/articles/s41587-020-0439-x) *Philip A. Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso, Sven Nahnsen* [resource](https://nf-co.re/rnaseq)

4) [quantms: a cloud-based pipeline for quantitative proteomics enables the reanalysis of public proteomics data](https://www.nature.com/articles/s41592-024-02343-1) *Chengxin Dai, Julianus Pfeuffer, Hong Wang, Ping Zheng, Lukas Käll, Timo Sachsenberg, Vadim Demichev, Mingze Bai, Oliver Kohlbacher, Yasset Perez-Riverol * [resource](https://nf-co.re/quantms)

5) [A technical review of multi-omics data integration methods: from classical statistical to deep generative approaches](https://academic.oup.com/bib/article/26/4/bbaf355/8220754) *Ana R Baião, Zhaoxiang Cai, Rebecca C Poulos, Phillip J Robinson, Roger R Reddel, Qing Zhong, Susana Vinga, Emanuel Gonçalves*

6) [Scikit-Bio](https://scikit.bio/index.html) *A community-driven Python library for bioinformatics, providing versatile data structures, algorithms and educational resources for Biology.*

7) [HMDB 5.0: the Human Metabolome Database for 2022](https://pubmed.ncbi.nlm.nih.gov/34986597/) *David S Wishart, AnChi Guo, Eponine Oler, Fei Wang, Afia Anjum, Harrison Peters, Raynard Dizon, Zinat Sayeeda, Siyang Tian, Brian L Lee, Mark Berjanskii, Robert Mah, Mai Yamamoto, Juan Jovel, Claudia Torres-Calzada, Mickel Hiebert-Giesbrecht, Vicki W Lui, Dorna Varshavi, Dorsa Varshavi, Dana Allen, David Arndt, Nitya Khetarpal, Aadhavya Sivakumaran, Karxena Harford, Selena Sanford, Kristen Yee, Xuan Cao, Zachary Budinski, Jaanus Liigand, Lun Zhang, Jiamin Zheng, Rupasri Mandal, Naama Karu, Maija Dambrova, Helgi B Schiöth, Russell Greiner, Vasuk Gautam* [resource](https://hmdb.ca/)

8) [MicroPhenoDB Associates Metagenomic Data with Pathogenic Microbes, Microbial Core Genes, and Human Disease Phenotypes](https://pubmed.ncbi.nlm.nih.gov/33418085/) *Guocai Yao, Wenliang Zhang, Minglei Yang, Huan Yang, Jianbo Wang, Haiyue Zhang, Lai Wei, Zhi Xie, Weizhong Li* [resource](http://www.liwzlab.cn/microphenodb)

9) [The National Microbiome Data Collaborative: enabling microbiome science](https://pubmed.ncbi.nlm.nih.gov/32350400/) *Elisha M Wood-Charlson, Anubhav, Deanna Auberry, Hannah Blanco, Mark I Borkum, Yuri E Corilo, Karen W Davenport, Shweta Deshpande, Ranjeet Devarakonda, Meghan Drake, William D Duncan, Mark C Flynn, David Hays, Bin Hu, Marcel Huntemann, Po-E Li, Mary Lipton, Chien-Chi Lo, David Millard, Kayd Miller, Paul D Piehowski, Samuel Purvine, T B K Reddy, Migun Shakya, Jagadish Chandrabose Sundaramurthi, Pajau Vangay, Yaxing Wei, Bruce E Wilson, Shane Canon, Patrick S G Chain, Kjiersten Fagnan, Stanton Martin, Lee Ann McCue, Christopher J Mungall, Nigel J Mouncey, Mary E Maxon, Emiley A Eloe-Fadrosh* [resource](https://data.microbiomedata.org/)

### Cheat Sheets

- Basics:

  - [Getting started](cheat_sheets/cheat_sheet_day0.pdf)

  - [Importing Data](cheat_sheets/Importing_Data_Cheat_sheet.pdf)

  - [Jupyter Notebook](cheat_sheets/Jupyter_Notebook_Cheat_Sheet.pdf)

- Data Science:

  - [Numpy](cheat_sheets/Numpy_Python_Cheat_Sheet.pdf)

  - [Pandas](cheat_sheets/Pandas_Cheat_Sheet.pdf)

  - [Scipy](cheat_sheets/Scipy-LinearAlgebra_Cheat_Sheet.pdf)

  - [Scikit-learn](cheat_sheets/Scikit-learn_Cheat_Sheet.pdf)

- Visualization:

  - [Matplotlib](cheat_sheets/Python_Matplotlib_Cheat_Sheet.pdf)

  - [Plot.ly](cheat_sheets/Plotly_Cheat_Sheet.pdf)

  - [Seaborn](cheat_sheets/Seaborn_Cheat_Sheet.pdf)

  - [Bokeh](cheat_sheets/Bokeh_Cheat_Sheet.pdf)

### Basics

- [learnpython.org](https://www.learnpython.org/)

  - interactive python basics tutorial

- [Springboard - Data Analysis with Python, SQL, and R](https://www.springboard.com/learning-paths/data-analysis/learn/)

  - starts with - [Solo Learn](https://www.sololearn.com/Course/Python/) and [Design of Computer Programs](https://www.udacity.com/course/design-of-computer-programs--cs212)

- [Scipy Lectures](https://scipy-lectures.org/index.html)

  - Python introduction with a focus on scientific computing

- [official tutorial](https://docs.python.org/3/tutorial/)

### Python Installations

In this course we use [Google Colab](https://colab.research.google.com/) to execute notebooks. Notebooks are text files allowing the combination of Text, Code and the output of code. Colab offers an extended set of pre-installed tools. See the [tutorial series](https://www.youtube.com/playlist?list=PLQY2H8rRoyvyK5aEDAI3wUUqC_F0oEroL).

[Anaconda](https://www.anaconda.com/products/individual) offers for your private computer an extended installations, including most tools you will ever need for Python.

## Acknowledgements

Some of the slides and notebooks have been inspired or reused from the Data [Science Platform](https://multiomics-analytics-group.github.io/data-science-platform/) at the [Informatics Platform](https://www.biosustain.dtu.dk/technologies/biofoundry/informatics) the Novo Nordisk Foundation Center for Biosustainability at the [Technical University of Denmark](https://www.dtu.dk/). 

Other relevant courses can be found in the [Biosustain GitHub](https://github.com/biosustain) (e.g., [R viz](https://github.com/biosustain/dsp_workshop_datavizR), [Python viz](https://github.com/biosustain/dsp_workshop_dataviz_python), [Nextflow training](https://github.com/biosustain/dsp_nextflow_training), [Proteomics](https://github.com/biosustain/dsp_course_proteomics_intro), [Transcriptomics](https://github.com/biosustain/dsp_transcriptomics_training), [Metagenomics](https://github.com/biosustain/dsp_metagenomics_training), [Bash](https://github.com/biosustain/dsp_workshop_bash), ...).

Some notebooks have been inspired by the course [Python Tsunami](https://github.com/Center-for-Health-Data-Science/PythonTsunami) at the [Center for Health Data Science (HeaDS)](https://heads.ku.dk/) at the [University of Copenhagen](https://www.ku.dk/).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/multiomics-analytics-group/course_multi-omics_data_science

Awesome Lists containing this project

README