Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

awesome-opendata-software

Awesome list of the software tools related to opendata: data catalogs, ingestion tools, data prep tools and so on
https://github.com/commondataio/awesome-opendata-software

Last synced: 5 days ago
JSON representation

  • Data catalogs

    • Research data repositories

      • Omega-PSIR - research management information system from Poland and used by Poland universities
      • Weco - Weko3 is a repository software based on invenio3.
      • Djehuty - The 4TU.ResearchData repository system
      • ERDDAP - ERDDAP is a data server that gives you a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps.
      • Galaxy - open source bioinformatics research management platform
      • InvenioRDM - The turn-key research data management repository
      • IPT - GBIF Integrated Publishing Toolkit (IPT). Data catalog software integrated into GBIF ecosystem.
      • LibreCat - A publication management system. Used to create research data repositories too.
      • NYU Data catalog - The NYU Data Catalog facilitates researchers’ access to large datasets available either publicly or through institutional or individual licensing. It also includes descriptions of internally-generated research datasets from NYU researchers.
      • Esploro - research outputs management system from Exlibris Group
      • DataCat - DataLad Catalog is a free and open source command line tool, with a Python API, that assists with the automatic generation of user-friendly, browser-based data catalogs from structured metadata.
      • Elsevier Digital Commons - Elsevier product to manage research output, similar to Elsevier Pure but less complicated.
      • MyTardis - MYTARDIS: Research data management for instrument data repositories, digital archives, digital libraries, and scientific journals.
      • MyCoRe - MyCoRe (portmanteau of My Content Repository) is an open source repository software framework for building disciplinary or institutional
      • THREDDS Data Server - The THREDDS Data Server (TDS) is a web server that provides metadata and data access for scientific datasets, using OPeNDAP, OGC WMS and WCS, HTTP, and other remote data access protocols.
      • Vufind - VuFind® is a discovery system designed and developed for libraries by libraries. It is also flexible enough to build search interfaces for all kinds of content beyond the library environment.
      • DataOne Hosted Repo - online catalog and SaaS hosted repositories
      • Elsevier Pure - Pure is a Research Information Management System (RIMS) or Current Research Information System (CRIS).
      • Converis - research data management product by Clarivate
      • Worktribe - Worktribe is an cloud-based platform for research management.
    • Statistics and indicators databases

      • OpenSDG - Open SDG. An open source, free-to-reuse platform for managing and publishing data and statistics related to the UN Sustainable Development Goals (SDGs).
      • PxWeb - PxWeb is used for publishing statistics in a data base at the web and is since 1 January 2016 free of charge for government agencies and municipalities, international NSI:s and international organisations of statistics.
      • .Stat Suite - The .Stat Suite is a standard-based, componentised, open source platform for the efficient production and dissemination of high-quality statistical data. The product is based on the General Statistical Business Process Model (GSBPM) and the Statistical Data and Metadata eXchange (SDMX) standards.
    • Geodata catalogs

      • LizMap - open source web map application from 3liz
      • ncWMS - ncWMS is a Web Map Service for displaying environmental data.
      • NextGIS Web - Web GIS framework by NextGIS
      • Esri Geoportal server - Geoportal Server is a standards-based, open source product that enables discovery and use of geospatial resources including data and services. **Not updated anymore**
      • Geoportal.rlp - A complete SDI-Suite for the management of OWS (WMS / WFS, CSW), metadata (iso19139), users, organizations, and licences.
      • Geoblacklight - A multi-institutional open-source collaboration building a better way to find and share geospatial data
      • Open Geoportal - The Open Geoportal (OGP) is a collaboratively developed, open source, federated web application to rapidly discover, preview, and retrieve geospatial data from multiple organizations.
      • OpenDataCube - The Open Data Cube (ODC) is an Open Source Geospatial Data Management and Analysis Software project that helps you harness the power of Satellite data.
      • Oskari - geoportal open source software from Finland Kadaster, incubating in Open Geo
      • Stac-server - A Node-based STAC API, AWS Serverless, OpenSearch
      • ArcGIS Server - ArcGIS Server is the server software component in ArcGIS Enterprise that makes your geographic information available to other users in your organization, and optionally to any Internet user.
      • ERDAS Apollo - Enables enterprise data management, discovery and delivery for geospatial data
      • Koordinates - Koordinates is a geospatial data management platform inspired by cracking GIS data out of vendor silos.
      • MetaGIS - commercial GIS server/portal from Sweden and popular in Sweden
      • OrbisMAP - Russian geoportal product
    • Open data portals

      • EntryScape catalog - DCAT AP compliant data catalog
      • Aleph - Aleph is a tool for indexing large amounts of both documents (PDF, Word, HTML) and structured (CSV, XLS, SQL) data for easy browsing and search.
      • DKAN - DKAN is a community-driven, free and open source open data platform that gives organizations and individuals ultimate freedom to publish and consume structured information.
      • Magda - A federated, open-source data catalog for all your big data and small data
      • uData - Customizable and skinnable social platform dedicated to (open)data by Etalab
      • Socrata - SaaS data platfrom popular in US and Canada. Socrata was acquired by Tyler Technologies in 2018 and is now the Data and Insights division of Tyler.
      • Tablion Data Portal - commercial data portal software from Aristotle Metadata, Australia
      • TriplyDb - TriplyDB integrates your organization's data assets into a standards-compliant knowledge graph.
    • Metadata catalogs

      • Fusion Metadata Registry - open source metadata catalog used by European Union authorities and some countries statistical agencies. Open source by request
    • Microdata catalogs

      • NADA Data Catalog - An open-source software designed for researchers to browse, search, compare, apply for access and download research data.
      • Obiba Mica - Mica is a powerful software application used to create data web portals for large-scale epidemiological studies or multiple-study consortia. Mica2 is the successor of Mica.
      • Colectica - Colectica is the fastest way to design, document, and publish your statistical data and survey research using open data standards.
  • Standards

    • Common data standards

      • Apache Parquet - Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk. Parquet is available in multiple languages including Java, C++, Python, etc.... It's still uncommon for open data portals but common for public ML data catalogs.
      • Arrow Columnar Format - The Arrow columnar format includes a language-agnostic in-memory data structure specification, metadata serialization, and a protocol for serialization and generic data transport.
      • NETCDF - NetCDF (network Common Data Form) is a set of interfaces for array-oriented data access and a freely distributed collection of data access libraries for C, Fortran, C++, Java, and other languages. The netCDF libraries support a machine-independent format for representing scientific data. Common for scientific data.
      • XLS - The Microsoft Excel Binary File format, with the .xls extension and referred to as XLS or MS-XLS, was the default format used for spreadsheets in Excel through Microsoft Office 2003. It is not open data format since it's proprietary, but it's _defacto_ very common.
      • CDF - CDF is a conceptual data abstraction for storing, manipulating, and accessing multidimensional data sets. The basic component of CDF is a software programming interface that is a device-independent view of the CDF data model. Common for scientific data.
      • CSV - Common Format and MIME Type for Comma-Separated Values (CSV) Files
      • JSON - JSON (JavaScript Object Notation) is a lightweight data-interchange format.
      • XLSX - The Open Office XML-based spreadsheet format using .xlsx as a file extension has been the default format produced for new documents by versions of Microsoft Excel since Excel 2007. It is not open data format since it's proprietary, but it's _defacto_ very common.
      • XML - Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.
      • RDF - The Resource Description Framework (RDF) is a general framework for representing interconnected data on the web. RDF statements are used for describing and exchanging metadata, which enables standardized exchange of data based on relationships.
    • Data containers

      • DataCrate - Data Crate is based on the Bagit packaging spec, with additional human and machine readable metadata in JSON-LD.
      • BagIt - BagIt is a set of hierarchical file layout conventions designed to support storage and transfer of arbitrary digital content. A "bag" consists of a directory containing the payload files and other accompanying metadata files known as "tag" files.
      • COMBINE - The “COmputational Modeling in BIology NEtwork” (COMBINE) is an initiative to coordinate the development of the various community standards and formats for computational models.
      • BioCompute Objects - BCOs are represented in JSON (JavaScript Object Notation) formatted text, adhearing to JSON schema draft-07. The JSON format was chosen because it is both human and machine readable/writable. For a detailed description of JSON see www.json.org.
      • Frictionless standards - A Data Package is a simple container format used to describe and package a collection of data (a dataset).
      • ReproZIP - ReproZip can automatically pack your research along with all necessary data files, libraries, environment variables and options into a self-contained bundle.
      • RO-CRATE - RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments.
    • Statistics specifications

      • DDI - The Data Documentation Initiative (DDI) is an international standard for describing the data produced by surveys and other observational methods in the social, behavioral, economic, and health sciences.
      • SDMX - A global initiative to improve Statistical Data and Metadata eXchange
    • Spatial data standards

      • CSW - Catalogue services support the ability to publish and search collections of descriptive information (metadata) for data, services, and related information objects. Open Geospatial Consortium standard.
      • ESRI Rest API - ArcGIS REST APIs used to query data from ArcGIS Enterprise products
      • FITS - FITS is a file format designed to store, transmit, and manipulate scientific images and associated data.
      • GeoPackage - Specifications in the family of GeoPackage formats (see GeoPackage_family) specify GeoPackages for exchange and GeoPackage SQLite Extensions that permit direct use, without intermediate format translations, of vector geospatial features and/or tile matrix sets of earth images and raster maps at various scales.
      • GeoTIFF - This OGC Standard defines the Geographic Tagged Image File Format (GeoTIFF) by specifying requirements and encoding rules for using the Tagged Image File Format (TIFF) for the exchange of georeferenced or geocoded imagery. Open Geospatial Consortium standard.
      • GML - The OpenGIS® Geography Markup Language Encoding Standard (GML) The Geography Markup Language (GML) is an XML grammar for expressing geographical features.
      • KML - KML is an XML language focused on geographic visualization, including annotation of maps and images.
      • OGC API - Records - OGC API - Records is a multi-part draft specification that offers the capability to create, modify, and query metadata on the Web.
      • ShapeFile - A shapefile is an Esri vector data storage format for storing the location, shape, and attributes of geographic features. It is stored as a set of related files and contains one feature class.
      • TMS - The OGC Tile Matrix Set standard defines the rules and requirements for a tile matrix set as a way to index space based on a set of regular grids defining a domain (tile matrix) for a limited list of scales in a Coordinate Reference System (CRS) as defined in [OGC 08-015r2] Abstract Specification Topic 2: Spatial Referencing by Coordinates.
      • WCS - A Web Coverage Service (WCS) offers multi-dimensional coverage data for access over the Internet. WCS Core specifies a core set of requirements that a WCS implementation must fulfill. Open Geospatial Consortium standard.
      • WFS - The Web Feature Service (WFS) represents a change in the way geographic information is created, modified and exchanged on the Internet. Rather than sharing geographic information at the file level using File Transfer Protocol (FTP), for example, the WFS offers direct fine-grained access to geographic information at the feature and feature property level. Open Geospatial Consortium standard.
      • WMS - The OpenGIS Web Map Service Interface Standard (WMS) provides a simple HTTP interface for requesting geo-registered map images from one or more distributed geospatial databases. Open Geospatial Consortium standard.
      • WMTS - OpenGIS Web Map Tile Service Implementation Standard
      • WPS - The OpenGIS® Web Processing Service (WPS) Interface Standard provides rules for standardizing how inputs and outputs (requests and responses) for geospatial processing services, such as polygon overlay.
    • Metadata standards

      • Executable Research Compendum - An Executable Research Compendium (ERC) is a packaging convention for computational research.
      • Google Search. Dataset (Dataset, DataCatalog, DataDownload) structured data - Google search description on implementation of Schema.org Dataset
      • Asset Description Metadata Schema, ADMS - metadata management of a European public administration or service and want to explore, (re-)use or share semantic assets (metadata or reference data)
      • CKAN API - defacto metadata standard for most open data portals
      • CSVW - CSV on the Web - CSV on the Web (CSVW) standard to add metadata to describe the contents and structure of comma-separated values (CSV) data files
      • DataCite Metadata Schema - The DataCite Metadata Schema is a list of core metadata properties chosen for an accurate and consistent identification of a resource for citation and retrieval purposes, along with recommended use instructions.
      • Dataset Publishing Language - Google metadata standard to prepare datasets for the Google Public Data Explorer.
      • DC Packaging Specification - provides protocols for packages to capture not only primary data, but also associated metadata and relationships to other objects (papers, projects, people, etc.) no matter where they are located.
      • DCAT - DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.
      • DCAT-US - DCAT-US Schema v1.1 (Project Open Data Metadata Schema). The metadata schema specified in this memorandum is based on DCAT, a hierarchical vocabulary specific to datasets.
      • DCAT-AP IT - This is version 2.0 of the ontology of the Italian application profile for the metadata that describe catalogues and data of Italian Public Administrations (DCAT-AP_IT).
      • DCAT-AP.de - DCAT-AP.de is the common German metadata model for the exchange of open administrative data. On this platform you will find the current version of the specification documents, sample files and DCAT-AP.de's own vocabularies.
      • Dublin Core - the most common digital objects description standard
      • EU Vocabularies - European Reference data catalogue
      • ISO 19115:2003 - ISO 19115:2003 defines the schema required for describing geographic information and services. It provides information about the identification, the extent, the quality, the spatial and temporal schema, spatial reference, and distribution of digital geographic data.
      • DCAT-AP 1.1 - The DCAT Application profile for data portals in Europe (DCAT-AP) is a specification based on W3C's Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. Version 1.1
      • DCAT-AP 2.1.1 - The DCAT Application Profile for data portals in Europe (DCAT-AP) is a specification based on W3C's Data Catalogue vocabulary (DCAT) for describing public sector datasets in Europe. Version 2.1.1
      • PEP, Portable Encapsulated Projects - PEP, or Portable Encapsulated Projects, is a community effort to make sample metadata reusable. PEPs decouple metadata from analysis
      • Schema.org Dataset - A body of structured information describing some topic(s) of interest.
      • Metatab and Metapack - Metatab stores metadata in a spreadsheet, alongside data, ensuring that the metadata is easy to create, easy to read, and cannot be separated from the data. Metapack builds data packages with Metatab metadata.
    • Specific data standards

      • Fiscal data package - Fiscal Data Package is a lightweight and user-oriented format for publishing and consuming fiscal data. Fiscal data packages are made of simple and universal components. They can be produced from ordinary spreadsheet software and used in any environment.
      • GTFS (General Transit Feed Specification) - defines a common format for public transportation schedules and associated geographic information. GTFS "feeds" let public transit agencies publish their transit data and developers write applications that consume that data in an interoperable way.
      • IATI Standard - The IATI Standard is a set of rules and guidance on how to publish useful development and humanitarian data.
      • Open Contracting Data Standard (OCDS) - The Open Contracting Data Standard (OCDS) enables disclosure of data and documents at all stages of the contracting process by defining a common data model.
    • Additional data standards resources

  • Tools

    • Data refining

      • OpenRefine - OpenRefine is a free, open source power tool for working with messy data and improving it
    • Data packaging

      • bdbag - The bdbag utilities are a collection of software programs for working with BagIt packages that conform to the BDBag and Bagit/RO profiles.
      • datalad - DataLad makes data management and data distribution more accessible. To do that, it stands on the shoulders of Git and Git-annex to deliver a decentralized system for data exchange.
    • Quality management

    • Data publishing

      • Datasette - An open source multi-tool for exploring and publishing data
    • Statistics tools

      • RSDMX - Tools for reading SDMX data and metadata in R