{"id":22777265,"url":"https://github.com/omicsdi/specifications","last_synced_at":"2026-01-08T02:51:37.761Z","repository":{"id":28390848,"uuid":"31905090","full_name":"OmicsDI/specifications","owner":"OmicsDI","description":"Description about the resource, file formats, etc ","archived":false,"fork":false,"pushed_at":"2022-04-21T19:58:13.000Z","size":44320,"stargazers_count":9,"open_issues_count":8,"forks_count":5,"subscribers_count":16,"default_branch":"master","last_synced_at":"2025-02-05T15:17:55.850Z","etag":null,"topics":["new-use-cases","omicsdi-architecture","omicsdi-xml","specifications"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OmicsDI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-03-09T15:05:56.000Z","updated_at":"2024-09-24T06:39:10.000Z","dependencies_parsed_at":"2022-09-08T18:00:43.452Z","dependency_job_id":null,"html_url":"https://github.com/OmicsDI/specifications","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmicsDI%2Fspecifications","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmicsDI%2Fspecifications/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmicsDI%2Fspecifications/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OmicsDI%2Fspecifications/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OmicsDI","download_url":"https://codeload.github.com/OmicsDI/specifications/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246323314,"owners_count":20758946,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["new-use-cases","omicsdi-architecture","omicsdi-xml","specifications"],"created_at":"2024-12-11T19:13:38.360Z","updated_at":"2026-01-08T02:51:37.725Z","avatar_url":"https://github.com/OmicsDI.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Omics Discovery Index\n\n[![Join the chat at https://gitter.im/PSI-PROXI/datadiscoveryindex](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/PSI-PROXI/datadiscoveryindex?utm_source=badge\u0026utm_medium=badge\u0026utm_campaign=pr-badge\u0026utm_content=badge)\n\nOmicsDI\n=========\n\nContents\n----------\n\n1. [Essentials](#1-essentials)  \n 1.1. [Objectives](#11-objectives)    \n 1.2. [What is OmicsDI](#12-what-is-omicsdi)  \n* [Databases](#2-databases)  \n  2.1. [Major Partners?](#21-major-partners)  \n  2.2. [Which databases are part of OmicsDI](#22-which-databases-are-part-of-omicsdi)  \n  2.3. [OmicsDI Architecture?](#23-Omicsdi-architecture)  \n  2.4. [OmicsDI XML](#24-omicsDI-xml)\n* [Web services and Web Application](#3-web-services-and-web-application) \n  3.1. [Access to the web-services](#31-access-to-the-web-services)\n  3.2. [Web Application](#32-web-application)\n* [Support](#4-support)  \n  3.1  [Get involved](#41-get-involved)  \n  3.2. [Contact](#42-contact)   \n* [License](#5-license)  \n\n1. Essentials\n--------------\n\n### 1.1. Objectives\n\n* Provide data discovery index for ‘omics’ datasets (genomics, proteomics and metabolomics).\n* Integrate databases and repositories from the US and Europe.\n* Provide the infrastructure to add new databases/repositories and to search, navigate, and find “high-quality”, relevant datasets.\n\n\n### 1.2. What is OmicsDI?\n\nIn the age of systems biology and data integration, omics data (proteomics/genomics/metabolomics) represents an essential \ncomponent to understand the “whole picture” of life. Access to this data is now a crucial component in biomedical research and\nrequires substantial investment and infrastructure. Several public repositories have been developed, each with different\npurposes in mind such as: [PRIDE] (http://www.ebi.ac.uk/pride/archive/), [PeptideAtlas] (http://www.peptideatlas.org/),\n[MassIVE](http://massive.ucsd.edu/ProteoSAFe/), [Metaboligths](http://www.ebi.ac.uk/metabolights/), [Metabolomics Workbench](http://www.metabolomicsworkbench.org/),\n[EGA](https://www.ebi.ac.uk/ega/). While there is often integration between repositories for the same data type, discovery\nof datasets across multiple omics datatypes remains a tedious task requiring different query strategies across multiple\nresources. The aim of OmicsDI is to address these issues by providing a central infrastructure for searching, storing\nand visualizing omics datasets.\n\n\n2. Databases\n-------------\n\n## 2.1. Major Partners\n\n\n- **ProteomeXchange**: The ProteomeXchange Consortium is a collaboration of currently three major mass spectrometry proteomics\ndata repositories, PRIDE at EMBL-EBI in Cambridge  (UK), PeptideAtlas at ISB in Seattle (US), and MassIVE at UCSD (US), \noffering a unified data deposition and discovery strategy across all three repositories. ProteomeXchange is a distributed\ndatabase infrastructure; the potentially very large raw data component of the data is only held at the original submission \ndatabase, while the searchable metadata is centrally collected and indexed. All ProteomeXchange data is fully open after \nrelease of the associated publication.\n\n- **MetabolomeXchange**: MetabolomeXchange is a collaboration of 4 major metabolomics repositories, with a total of 10 partners\ncontributing. MetabolomeXchange was inspired by and is implementing similar coordination strategies to ProteomeXchange. \nThe founding partners are MetaboLights at EMBL-EBI(UK),Metabolomics Repository Bordeaux(FR), Golm Metabolome Database and \nthe Metabolomics Workbench (US). The Metabolomics Workbench is a NIH funded collaboration of 6 Regional Comprehensive \nMetabolomics Resource Cores. MetabolomeXchange started accepting metadata submissions in summer 2014, and reached 200 public\ndatasets in March 2015.\n\n- **The European Genome-Phenome Archive**: The European Genome-Phenome Archive (EGA) provides a service for the permanent \narchiving and distribution of personally identifiable genetic and phenotypic data resulting from biomedical research projects. \nData at EGA was collected from individuals whose consent agreements authorise data release only for specific research use by \nresearchers. Strict protocols govern how information is managed, stored and distributed by the EGA project.\nThe EGA comprises a public metadata section, allowing searching and identifying relevant studies, and the controlled access\ndata section. Access to the data section for a particular study is only granted after validation of a research proposal through\nthe relevant ethics approval.\n\n### 2.2. Which databases are part of OmicsDI?\n\nCurrently six different databases in two continents are part of the OmicsDI consortium: \n \n  - [PRIDE](http://www.ebi.ac.uk/pride/archive/): The **PR**oteomics **IDE**ntification database. (EMBL-EBI, UK)(Proteomics)\n  - [PeptideAtlas](http://www.peptideatlas.org/): A multi-organism, publicly accessible compendium of peptides identified in a large set of tandem mass spectrometry proteomics experiments. (System Biology Inst, Seattle, US)(Proteomics)\n  - [MassIVE](http://massive.ucsd.edu/ProteoSAFe/): The **Mass** spectrometry **I**nteractive **V**irtual **E**nvironment, is a community resource to promote the global, free exchange of mass spectrometry data. (UCSD, US)(Proteomics)\n  - [Metaboligths](http://www.ebi.ac.uk/metabolights/): The Metabolomics Database. (EMBL-EBI, UK)(Metabolomics)\n  - [Metabolomics Workbench](http://www.metabolomicsworkbench.org/): UCSD Metabolomics Workbench, a resource sponsored by the Common Fund of the National Institutes of Health. (UCSD, US) (Metabolomics)\n  - [EGA](https://www.ebi.ac.uk/ega/): The European Genome-phenome Archive. (EMBL-EBI, UK)(Genomics)\n  \n\n### 2.3. OmicsDI Architecture?\n\n![Omics Discovery Index Architecture](https://raw.githubusercontent.com/BD2K-DDI/specifications/master/docs/architecture/architecture.png)\n\nThe architecture of the Omics Discovery Index started with an XML [see here](#24-omicsDI-xml) files that contains \nthe information from each dataset in the database. Each file is retrieved from the providers every night and if a new dataset was \nadded the system will add it. \n\nEach file is indexed using the EBI Search Indexing System and the final information is exposed using web-services. The EBI Search System also \ncontain the index of other major databases such as Uniprot, ENSEMBL, Pubmed allowing the user cross-link the biological information with \nthose resources [see here](#24-omicsDI-xml) . \n\n### 2.4. OmicsDI XML?\n\nThe [OmicsDI XML](https://github.com/BD2K-DDI/specifications/blob/master/docs/schema/README.md) is used to export every database (including all the datasets) to a common structure [Full Description](https://github.com/BD2K-DDI/specifications/blob/master/docs/schema/README.md).\nThe XML structure (in short) is the following:\n\n```xml\n\u003cdatabase\u003e\n    \u003cname\u003eDatabase Name\u003c/name\u003e\n    \u003cdescription\u003eDatabase Description\u003c/description\u003e\n    \u003crelease\u003eNumber of the release\u003c/release\u003e\n    \u003crelease_date\u003eRelease Date\u003c/release_date\u003e\n    \u003centry_count\u003eNumber of entries\u003c/entry_count\u003e\n    \u003centries\u003e      \n        \u003centry id=\"Dataset_ID_1\"\u003e\n            \u003cname\u003eName of the Dataset\u003c/name\u003e\n            \u003cdescription\u003eDescription of the dataset\u003c/description\u003e\n            \u003ccross_references\u003e\n                \u003cref dbkey=\"CHEBI:16551\" dbname=\"ChEBI\"/\u003e\n                \u003cref dbkey=\"MTBLC16551\" dbname=\"MetaboLights\"/\u003e\n                \u003cref dbkey=\"CHEBI:16810\" dbname=\"ChEBI\"/\u003e\n                \u003cref dbkey=\"MTBLC16810\" dbname=\"MetaboLights\"/\u003e\n                \u003cref dbkey=\"CHEBI:30031\" dbname=\"ChEBI\"/\u003e\n            \u003c/cross_references\u003e\n            \u003cdates\u003e\n                \u003cdate type=\"submission\" value=\"2013-11-19\"/\u003e\n                \u003cdate type=\"publication\" value=\"2013-11-26\"/\u003e\n            \u003c/dates\u003e\n            \u003cadditional_fields\u003e\n                \u003cfield name=\"repository\"\u003eRepository\u003c/field\u003e\n                \u003cfield name=\"omics_type\"\u003eOmics Type\u003c/field\u003e\n            \u003c/additional_fields\u003e    \n        \u003c/entry\u003e\n    \u003c/entries\u003e\n\u003c/database\u003e         \n```\n\n- Cross references to other database entities is used to link for example external sources in the dataset. The **dbkey** correspond\n with the entity in the database and the **dbname** correspond with the database id. Some examples:\n   \n   \n  - If the dataset is from a Human sample, the cross reference should be: \u003cref dbkey=\"9606\" dbname=\"TAXONOMY\"/\u003e\n   \n  - If the proteomics experiment identified/quantified a UNIPROT id P31946, the cross reference should be: \u003cref dbkey=\"P31946\" dbname=\"Uniprot\"/\u003e\n\n  - If the dataset was published in a scientific journal and is indexed in pubmed, the cross reference should be: \u003cref dbkey=\"26013411\" dbname=\"pubmed\"/\u003e\n\nA full description of the XML Schema version 1.0 can be found [here] (https://github.com/BD2K-DDI/specifications/blob/master/docs/schema/README.md). A complete list of all databases databases for reference can be found in [this site](http://www.ebi.ac.uk/ebisearch/). Some files from Proteomics/Metabolomics Data can be found [here]().  \n\n3. Web services and Web Application\n-----------------------\n\n### 3.1. Access to the web-services\n\nMost data in the Omics Discovery Index can be accessed programmatically using a [RESTful API](http://wwwdev.ebi.ac.uk/Tools/ddi/ws/) allowing for integration with other resources. \nThe API implementation is based on the Spring Rest Framework.\n\n**Web browsable API**\n\n  - The query results returned by the API are available in JSONformat. This ensures that they can be viewed by human and\n    accessed programmatically by computer.\n\n  - The main RESTful API page provides a simple web-based user interface, which allows developers can familiarise\n    themselves with the API and get a better sense of the OmicsDI data before writing single line of code.\n\n  - Many resources are hyperlinked so that it's possible to navigate the API in the browser.\n\nAs a result, developers can familiarise themselves with the API and get a better sense of the [OmicsDI](http://wwwdev.ebi.ac.uk/Tools/ddi/ws/) data.\n\n### 3.2. Web Application\n\nThe main goal of OmicsDI project is to have a way to search interesting datasets across omics repositories. \nThe main web application and web service [see here](http://wwwdev.ebi.ac.uk/Tools/ddi/) allow the user to search and navigate through the OmicsDI datasets.\nThe OmicsDI web application has two main different way of navigate the data: (i) using the home page navigation blocks or (ii) the search box.\n\n\n4. Support\n----------\n\n### 4.1. Get involved\n\nIf you are interested in the project and add your resource to the Index please type an issue [here](https://github.com/BD2K-DDI/specifications/issues/)\n\n### 4.2. Contact\n\nVisit the [OmicsDI](http://wwwdev.ebi.ac.uk/Tools/ddi/) website.\n\nVisit the [GitHub](https://github.com/BD2K-DDI/) page for the source code and the libraries.\n\nVisit the [Specification](https://github.com/BD2K-DDI/specifications) page to make comments and requests.\n\n\n5. License\n----------\n\n[Apache 2](http://www.apache.org/licenses/LICENSE-2.0)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomicsdi%2Fspecifications","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fomicsdi%2Fspecifications","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fomicsdi%2Fspecifications/lists"}