{"id":20938216,"url":"https://github.com/eldeveloper/dogs","last_synced_at":"2025-08-31T12:13:26.233Z","repository":{"id":145490700,"uuid":"59334009","full_name":"ElDeveloper/dogs","owner":"ElDeveloper","description":"Companion notebooks for \"Dog and human inflammatory bowel disease rely on overlapping yet distinct dysbiosis networks\"","archived":false,"fork":false,"pushed_at":"2018-07-13T08:13:00.000Z","size":57931,"stargazers_count":11,"open_issues_count":0,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-13T22:40:09.818Z","etag":null,"topics":["16s","ibd","jupyer","microbiome","notebook","qiime"],"latest_commit_sha":null,"homepage":"https://www.nature.com/articles/nmicrobiol2016177","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ElDeveloper.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2016-05-21T00:35:14.000Z","updated_at":"2024-05-13T20:14:00.000Z","dependencies_parsed_at":null,"dependency_job_id":"d097fbf6-29f9-4318-80f6-4faa5c0fbc58","html_url":"https://github.com/ElDeveloper/dogs","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ElDeveloper/dogs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElDeveloper%2Fdogs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElDeveloper%2Fdogs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElDeveloper%2Fdogs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElDeveloper%2Fdogs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ElDeveloper","download_url":"https://codeload.github.com/ElDeveloper/dogs/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ElDeveloper%2Fdogs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272977849,"owners_count":25025210,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["16s","ibd","jupyer","microbiome","notebook","qiime"],"created_at":"2024-11-18T22:49:45.840Z","updated_at":"2025-08-31T12:13:26.212Z","avatar_url":"https://github.com/ElDeveloper.png","language":"Jupyter Notebook","readme":"Analysis Notebooks\n==================\n\nThis repository contains the notebooks used to create the analyses in\n[Vazquez-Baeza et al. 2016](http://www.nature.com/articles/nmicrobiol2016177).\nEach notebook tries to summarize a particular step and be as standalone as\npossible. In this document we briefly describe each of the analyses needed for\nthe paper and when possible we group them together, so as to provide more\ncohesion.\n\n#### Important notes for the reader\n\n- This repository does not provide **all** the data files used, as the size\n  would exceed the limits allowed in GitHub. However, we provide the main data\n  files (metadata and OTU tables), from which the rest of the data can be\n  generated using these commands. The rest of the sequence data can be freely\n  accessed through Qiita (remember to log in) in [study\n  833](https://qiita.ucsd.edu/study/description/833).\n\n- In several locations in the notebooks, there are cells that reference a\n  remote address (i.e. in `ssh` or `scp` commands), you don't need to execute\n  these commands, the files being fetched should already be provided in this\n  repository.\n\n-----------------------\n\n## Metadata\n\nThe metadata processing required two steps, one for cleanup of the data and\nanother one to prepare the data for [Qiita](https://qiita.microbio.me).\n\n[**1-metadata**](notebooks/01-metadata.ipynb): the metadata is cleaned up and\nfiltered to remove samples that we didn't use in the rest of the analyses. We\nalso calculate the dysbiosis index as defined in [Gevers et. al.\n2014](http://www.ncbi.nlm.nih.gov/pubmed/24629344).\n\n[**1.1-metadata-for-qiita**](notebooks/01.1-metadata-for-qiita.ipynb): adds the\nneeded fields and columns to the mapping file, and creates a sample and prep\ntemplate that were used to upload the data into the [Qiita\nstudy](https://qiita.ucsd.edu/study/description/833).\n\n\n## Alpha diversity\n\n[**2-alpha-diversity**](notebooks/02-alpha-diversity.ipynb): this notebook includes the following alpha diversity\ncomparisons: fat, protein, age, weight and disease state. As well as a comparison\nof the human-trained dysbiosis index and alpha diversity. Of note, we did this\ncomparisons for several metrics, but only used Faith's phylogenetic diversity\nin the manuscript.\n\n## Beta diversity\n\n[**3-beta-diversity**](notebooks/03-beta-diversity.ipynb): this notebook\nincludes the creation of the beta diversity plots for the dog dataset only.\nBiplots and statistics to assess clustering significance are also performed as\npart of this notebook.\n[**3.1-beta-diversity-antibiotics.ipynb**](notebooks/03.1-beta-diversity-antibiotics.ipynb):\ncompares the differences between samples according to their history fo\nantibiotic usage.\n\n## Group significane\n\n[**4-group-significance**](notebooks/04-group-significance.ipynb): this notebook\ntests statistical significance between the disease affected and unaffected\ndogs, and plots their relative abundance as a heatmap. While none of the plots\nshown in this notebook were used in this paper, it helped guide our analysis\nfor the next few notebooks.\n\n## Feature exploration\n\n[**5-feature-exploration**](notebooks/05-feature-exploration.ipynb): this\nnotebook looks at a few different ways to filter out the data so as to avoid\nOTUs that are not well represented througout the samples.\n\n## New dysbiosis index\n\nAfter realizing that the human-trained dysbiosis index didn't perform as well\nin dogs, we decided to use CCREPE to train a new dysbiosis index using the dog\ndata alone. In [**6-md-index-ccrepe**](notebooks/06-md-index-ccrepe.ipynb) we\ncalculate the checkerboard scores and asscociated significance tables. These\nresults are used in\n[**6.1-md-index-ccrepe-visualizations**](notebooks/06.1-md-index-ccrepe-visualizations.ipynb),\nwhere we visualize them in a variety of ways, ultimately resolving that we\nshould use Cytoscape to do that. The final section of this notebook shows the\nplots relating alpha diversity and the index.\n\n## Classification accuracy\n\nThe ROC curves and feature importance scores are created in\n[**7-classifier**](notebooks/07-classifier.ipynb) and\n[**7.1-classifier-feature-importance**](notebooks/07.1-classifier-feature-importance.ipynb)\n(respectively). Here we use R and [hack_ml](https://github.com/rnaer/hack_ml)\nto create the plots and tables.\n\n## Human vs Dog comparison\n\nIn [**8-comparison**](notebooks/08-comparison.ipynb) we explore the combined\ndata and perform a few comparisons that were ultimately not used in the paper.\n[**8.1-comparison**](notebooks/08.1-comparison.ipynb) is concerned with making\nthe data between humans and dogs as comparable as possible.\n\n## PICRUSt\n\nPICRUSt predictions were generated at the [galaxy\nserver](https://huttenhower.sph.harvard.edu/galaxy/). In\n[**9-picrust**](notebooks/09-picrust.ipynb) we compare the combined human and\ndog samples, and in [**9.1-picrust-nsti**](notebooks/09.1-picrust-nsti.ipynb) we\nuse the NSTI (nearest sequenced taxon index) to assess the quality of our\npredictions.\n\n## Comparison with Minamoto et al 2015\n\nIn [**10-minamoto-md-index**](notebooks/10-minamoto-md-index.ipynb) notebook we\nuse the dog-trained dysbiosis index in a different dataset, that was processed\nmainly in a separate supercomputer.\n\n## Read counts\n\nIn [**11-sequence-counts**](notebooks/11-sequence-counts.ipynb), we explore the\nnumber of sequences that were assigned to an OTU per sample. Specifically we\ncompare the differences betwee closed and open reference protocols.\n\n---------------------\n\n[This Dockerfile](Dockerfile) should install any additional dependencies for\nthese notebooks to work.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feldeveloper%2Fdogs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feldeveloper%2Fdogs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feldeveloper%2Fdogs/lists"}