{"id":27335289,"url":"https://github.com/dcdanko/md2","last_synced_at":"2025-04-12T14:47:43.486Z","repository":{"id":57441406,"uuid":"194744398","full_name":"dcdanko/MD2","owner":"dcdanko","description":"MicrobeDirectory 2.0","archived":false,"fork":false,"pushed_at":"2020-08-10T13:13:24.000Z","size":125765,"stargazers_count":22,"open_issues_count":2,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-15T23:05:27.103Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dcdanko.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-07-01T21:20:00.000Z","updated_at":"2024-11-13T07:22:02.000Z","dependencies_parsed_at":"2022-09-05T23:51:41.459Z","dependency_job_id":null,"html_url":"https://github.com/dcdanko/MD2","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2FMD2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2FMD2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2FMD2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dcdanko%2FMD2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dcdanko","download_url":"https://codeload.github.com/dcdanko/MD2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248585288,"owners_count":21128974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-12T14:47:42.292Z","updated_at":"2025-04-12T14:47:43.473Z","avatar_url":"https://github.com/dcdanko.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# The Microbe Directory v2.0\n*The ultimate microbe database*\n\n**The Microbe Directory (TMD)** is a collective research effort to profile and annotate more than 68,000 microbial species that include Bacteria, Archaea, Virus, Fungi, and Algae.\n\nTMD aims to:\n\n\u003e* Provide a curated list of microbes from three domains: Archaea, Bacteria, Eukarya and Virus.\n\u003e* Compile microbial data from different [databases](#first) and [studies](#second) into a single one.\n\u003e* Give a phenotypic and ecologic description of microbial species [parameters](#third).\n\u003e* Annonante the microbiome where taxa have been identified.\n\u003e* Make microbial data handy to everyone!\n\n---\n\n## \u003ca name=\"first\"\u003eDatabases\u003c/a\u003e\n\n* [Global Biodiversity Information Facility- GBIF](https://www.gbif.org/) an international network aimed to provide open access to data about all types of life on Earth. \n* [Virus-Host DB](https://www.genome.jp/virushostdb) a database about the relationships between viruses and their hosts.\n* [HaloDom](http://halodom.bio.auth.gr) a database for halophilic organisms.\n* [The Mycology Collections data Portal- MyCoPortal](http://mycoportal.org/portal/index.php) a network of universities, botanical gardens, museums, and agencies that provide taxonomic, environmental, and specimen-based information of Fungi.\n* [ISHAM Barcoding Database](http://its.mycologylab.org) a database for human/animal pathogenic fungal species.\n* [FungiDB](https://fungidb.org/fungidb/) an integrated genomic and functional genomic database for Fungi. Includes experimental and environmental isolate sequence data.\n* [RefSoil](https://www.nature.com/articles/ismej2016168#Sec6): A soil microbiome database\n\n---\n\n## \u003cb name=\"second\"\u003eStudies\u003c/b\u003e\n\n* [MetaSUB](http://metasub.org): Molecular profile of cities around the globe to improve their design, functionality, and impact on health. \n* [Earth Microbiome Project](http://www.earthmicrobiome.org): Characterization of microbial communities around the globe.\n* [TARA Oceans](http://ocean-microbiome.embl.de/companion.html):  Metagenomic study of oceans samples in epipelagic and mesopelagic waters across the globe.\n* [Soil bacterial and fungal communities across a pH gradient in an arable soil](https://qiita.ucsd.edu/study/description/94): Soils collected across a long-term liming experiment (pH 4.0-8.3).\n* [The ecology of the phyllosphere](https://qiita.ucsd.edu/study/description/396): Bacterial communities from leaves of 56 tree species in Boulder, Colorado, USA.\n* [Characterization of Airborne Microbial Communities at a High-Elevation Site and Their Potential To Act as Atmospheric Ice Nuclei](http://dx.doi.org/10.1128/AEM.00447-09): Atmospheric microbial abundance, community composition, and ice nucleation at a high-elevation site in northwestern Colorado.\n* [Microbial community composition in a lowland tropical rain forest- Costa Rica](https://doi.org/10.1016/j.soilbio.2010.08.011): Plot-scale manipulations of organic matter inputs to soils correlate with shifts in microbial community composition in a lowland tropical rain forest.\n\n**The Microbe Directory (TMD)** is a collective research effort to profile and annotate more than 30,000 microbial species that include Bacteria, Archaea, Virus, Fungi, and Algae.\n\nTMD aims to:\n\n\u003e* Provide a curated list of microbes from four domains: Virus, Archaea, Bacteria and Eukarya\n\u003e* Compile microbial data from different [databases](#first) and [studies](#second) into a single one\n\u003e* Give a phenotypic and ecologic description of microbial species [parameters](#third).\n\u003e* Provide a community portal to add data and annotate new microbes.\n\u003e* Make a machine and human readable database. \n\n---\n\n## \u003cx name=\"third\"\u003eThe Database\u003c/x\u003e\n\nDifferent features are important for different types of Microbe. It doesn't make much sense to talk about the Gram-Stain of a Virus or the Capsid symmetry of a Bacteria. To make data as relevant as possible we have split the data in **The Microbe Directory** into three domains.\n\n#### Virus\n\n1. Genetic material: Virus have either RNA or DNA as their genetic material\n2. Strand: The nucleic acid may be single (*ss*) or double stranded (*ds*).\n3. Capsid symmetry: The way in which the capsid units are arranged.\n  * Helical \n  * Icosahedral\n  * Complex\n4. Envelop: The outer layer of a virus that protects the nucleic acid. Virus without envelop are called naked.\n5. Is it a pathogen? If yes, which is its host.\n   * Human\n   * Animal\n   * Plant\n   * Bacteria\n   * Fungi\n\n#### Bacteria and Archaea Only\n\n1. Gram stain: Used to distinguish and classify bacterial species into two large groups: Gram-positive and Gram-negative. \n2. Antimicrobial resistance (AMR): Antimicrobial resistance occurs naturally over time, usually through genetic changes. However, the misuse and overuse of antimicrobials is accelerating this process. \n3. Type of metabolisms:  the nutrition mode of microbes according to the sources of energy and carbon needed for living, growth and reproduction. All sorts of combinations may exist in nature.\n   * Primary source of energy:\n     * Phototrophs: Light is absorbed in photo receptors and transformed into chemical energy\n     * Chemotrophs: Bond energy is released from a chemical compound.\n   * Primary sources of reducing equivalents:\n     * Organotrophs: Organic compounds are used as electron donor.\n     * Lithotrophs: Inorganic compounds are used as electron donor.\n   * Primary sources of carbon\n     * Heterotrophs: Organic compounds are metabolized to get carbon for growth and development.\n     * Autotrophs: Carbon dioxide (CO2) is used as source of carbon.\n\n\n#### Bacteria, Archaea and Eukarya\n\n1. Biofilm forming: Biofilms are multicellular communities held together by a self-produced extracellular matrix.  Biofilms impact humans in many ways as they can form in natural, medical, and industrial settings.\n2. Spore forming: Also referred to as endospores, are the dormant form of vegetative microbes and are highly resistant to physical and chemical influences.\n3. Microbiome: Host or environment where microbes are usually found.\n   * Host: Microbes might be commensal or pathogenic to their host. Commensal microbes are found to be crucial to the survival of their hosts.\n     * Sponges\n     * Corals\n     * Fungi\n     * Plant\n     * Animal\n     * Human: Body sites of [Human Microbiome Project](https://www.hmpdacc.org/hmp/) \n   * Soil: Microbes are essential for soils. They are main drivers of nutrient cycles in soils, decompose organic matter, promote plant growth and control pests and diseases. \n     * Tundra\n     * Grassland\n     * Croplands\n     * Forest\n       * Tropical\n       * Temperate\n       * Boreal\n   * Extreme: Microbes that live in habitats considered hard to survive in due to its extreme conditions such as temperature, accessibility to different energy sources or under high pressure.\n     * Desert\n     * Polar\n     * Deep ocean\n     * Space\n   * Water: Water can support the growth of many types of microorganisms. Microbes are main drivers of biogeochemical processes and nutrient cycling. \n     * Ocean\n     * Fresh\n     * Mangrove\n     * Sediments\n4. Is it a pathogen? if Yes, which is its host:\n     * Fungi\n     * Plant\n     * Animal\n     * Human: Body sites of [Human Microbiome Project](https://www.hmpdacc.org/hmp/) \n5. Extremophile: a microbe that thrives in physically or geochemically extreme conditions that are detrimental to most life on Earth. Microbes that can only live under optimal conditions are called Mesophiles.\n6. If extremophile, which type.\n   * Acidophile: Microbes that live in acidic systems with pH -0.06 to 4.0.\n   * Alkaliphile:  Microbes capable of survival in alkaline environments with pH 8.5–11 \n   * Halophile:  Microbes that thrive in high salt concentrations. \n   * Metallotolerant: Microbes that survive in environments with a high concentration of dissolved heavy metals in solution\n   * Barophile: Also called piezophile, are microbes which thrive at high pressures such as deep seas.\n   * Psychrophile: Also called cryophiles, are microbes capable of growth in low temperatures, ranging from −20°C to 10°C.\n   * Radioresistant: Microbes capable of withstand high levels of ionizing radiation.\n   * Thermophile: Microbes that live at high temperatures between 41°C and 122°C.\n   * Xerophile: Microbes that grow and reproduce in conditions with a low availability of water.\n   * Hypolith: Organisms that live underneath rocks in cold deserts.\n   * Oligotroph: Microbes capable of growth in nutritionally limited environments.\n\n---\n\n## Data Sources\n\n### \u003ca name=\"first\"\u003eDatabases\u003c/a\u003e\n\n**The Microbe Directory** collates data from a number of other databases. Some databases directly provide information about microbes. These databases include annotations for a number of different types of microbial traits.\n\n* [Global Biodiversity Information Facility- GBIF](https://www.gbif.org/) an international network aimed to provide open access to data about all types of life on Earth. \n* [Virus-Host DB](https://www.genome.jp/virushostdb) a database about the relationships between viruses and their hosts.\n* [HaloDom](http://halodom.bio.auth.gr) a database for halophilic organisms.\n* [The Mycology Collections data Portal- MyCoPortal](http://mycoportal.org/portal/index.php) a network of universities, botanical gardens, museums, and agencies that provide taxonomic, environmental, and specimen-based information of Fungi.\n* [BioCyc Database Collection](https://biocyc.org) a collection of Pathway/Genome Databases for Bacteria.\n* [ISHAM Barcoding Database](http://its.mycologylab.org) a database for human/animal pathogenic fungal species.\n* [FungiDB](https://fungidb.org/fungidb/) an integrated genomic and functional genomic database for Fungi. Includes experimental and environmental isolate sequence data.\n* [RefSoil](https://www.nature.com/articles/ismej2016168#Sec6): A soil microbiome database\n\n### \u003cb name=\"second\"\u003eStudies\u003c/b\u003e\n\n**The Microbe Directory** also includes collated results from a number of projects on microbial communities. These studies are condensed into summary results describing the settings where a microbe may be found.\n\u003c\u003c\u003c\u003c\u003c\u003c\u003c HEAD\n\n* [MetaSUB](http://metasub.org): Molecular profile of cities around the globe to improve their design, functionality, and impact on health. \n* [Earth Microbiome Project](http://www.earthmicrobiome.org): Characterization of microbial communities around the globe.\n* [TARA Oceans](http://ocean-microbiome.embl.de/companion.html):  Metagenomic study of oceans samples in epipelagic and mesopelagic waters across the globe.\n* [Soil bacterial and fungal communities across a pH gradient in an arable soil](https://qiita.ucsd.edu/study/description/94): Soils collected across a long-term liming experiment (pH 4.0-8.3).\n* [The ecology of the phyllosphere](https://qiita.ucsd.edu/study/description/396): Bacterial communities from leaves of 56 tree species in Boulder, Colorado, USA.\n* [Characterization of Airborne Microbial Communities at a High-Elevation Site and Their Potential To Act as Atmospheric Ice Nuclei](http://dx.doi.org/10.1128/AEM.00447-09): Atmospheric microbial abundance, community composition, and ice nucleation at a high-elevation site in northwestern Colorado.\n* [Microbial community composition in a lowland tropical rain forest- Costa Rica](https://doi.org/10.1016/j.soilbio.2010.08.011): Plot-scale manipulations of organic matter inputs to soils correlate with shifts in microbial community composition in a lowland tropical rain forest.\n* [Microbial communities on money](https://qiita.ucsd.edu/study/description/375)\n\n---\n\n## Installation\tand Use \n\n**The Microbe Directory** may be accessed as a set of csv files. We also provide an API to provide programmatic access to **The Microbe Directory**. This API includes several statistical functions meant to compare microbial communities based on their annotated traits.\n\n### Installation\n\nFrom PyPi\n```\npip install microbe_directory\n```\n\nFrom source\t\n```\t\ngit clone https://github.com/dcdanko/MD2\t\ncd MD2\t\npython setup.py install\t\n=======\n\n* [MetaSUB](http://metasub.org): Molecular profile of cities around the globe to improve their design, functionality, and impact on health. \n* [Earth Microbiome Project](http://www.earthmicrobiome.org): Characterization of microbial communities around the globe.\n* [TARA Oceans](http://ocean-microbiome.embl.de/companion.html):  Metagenomic study of oceans samples in epipelagic and mesopelagic waters across the globe.\n* [Soil bacterial and fungal communities across a pH gradient in an arable soil](https://qiita.ucsd.edu/study/description/94): Soils collected across a long-term liming experiment (pH 4.0-8.3).\n* [The ecology of the phyllosphere](https://qiita.ucsd.edu/study/description/396): Bacterial communities from leaves of 56 tree species in Boulder, Colorado, USA.\n* [Characterization of Airborne Microbial Communities at a High-Elevation Site and Their Potential To Act as Atmospheric Ice Nuclei](http://dx.doi.org/10.1128/AEM.00447-09): Atmospheric microbial abundance, community composition, and ice nucleation at a high-elevation site in northwestern Colorado.\n* [Microbial community composition in a lowland tropical rain forest- Costa Rica](https://doi.org/10.1016/j.soilbio.2010.08.011): Plot-scale manipulations of organic matter inputs to soils correlate with shifts in microbial community composition in a lowland tropical rain forest.\n* [Microbial communities on money](https://qiita.ucsd.edu/study/description/375)\n\n---\n\n## Installation\tand Use \n\n**The Microbe Directory** may be accessed as a set of csv files. We also provide an API to provide programmatic access to **The Microbe Directory**. This API includes several statistical functions meant to compare microbial communities based on their annotated traits.\n\n### Installation\n\nFrom PyPi\n```\npip install microbe_directory\n```\n\nFrom source\t\n```\t\ngit clone https://github.com/dcdanko/MD2\t\ncd MD2\t\npython setup.py install\t\n```\n\n### Building TMD Tables from Source Databases\n\nTMD uses `make` to build tables, see the Makefile for details.\n```\nmake clean  # delete the current tables\nmake test   # run unit tests\nmake all    # make  all tables\nmake bact   # make bateria/archaea table\nmake euks   # make eukaryotic table\nmake virus  # make viral table\n\u003e\u003e\u003e\u003e\u003e\u003e\u003e 28730154519877cf233172208ca4d76b2c71057c\n```\nThe following outputs the taxonomy of all available Bacteria and Viruses from the NCBI dmp files. The table consists of the scientific name and classification from phylum-species level along with the unique taxonomic id.\n\n\u003c\u003c\u003c\u003c\u003c\u003c\u003c HEAD\n### Building TMD Tables from Source Databases\n\nTMD uses `make` to build tables, see the Makefile for details.\n```\nmake clean  # delete the current tables\nmake test   # run unit tests\nmake all    # make  all tables\nmake bact   # make bateria/archaea table\nmake euks   # make eukaryotic table\nmake virus  # make viral table\n```\nThe following outputs the taxonomy of all available Bacteria and Viruses from the NCBI dmp files. The table consists of the scientific name and classification from phylum-species level along with the unique taxonomic id.\n\n## License and Use\n\n=======\n## License and Use\n\n\u003e\u003e\u003e\u003e\u003e\u003e\u003e 28730154519877cf233172208ca4d76b2c71057c\nAll original material in TMD is provided under the MIT License. Some of the source databases may have restrictions on commercial use.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcdanko%2Fmd2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdcdanko%2Fmd2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdcdanko%2Fmd2/lists"}