Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/awesomedata/awesome-public-datasets

A topic-centric list of HQ open datasets.
https://github.com/awesomedata/awesome-public-datasets

List: awesome-public-datasets

aaron-swartz awesome-public-datasets datasets opendata

Last synced: 25 days ago
JSON representation

A topic-centric list of HQ open datasets.

Awesome Lists containing this project

README

        

Awesome Public Datasets
=======================

.. image:: https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg
:alt: Awesome
:target: https://github.com/sindresorhus/awesome

This is a list of `topic-centric public data sources `_
in high quality. They are collected and tidied from blogs, answers, and user responses.
Most of the data sets listed below are free, however, some are not.
This project was incubated at `OMNILab `_, Shanghai Jiao Tong University during Xiaming Chen's Ph.D. studies.
OMNILab is now part of the `BaiYuLan Open AI community `_.
Other amazingly awesome lists can be found in `sindresorhus's awesome `_ list.

**NOTICE**: This repo is automatically generated by `apd-core `_.
Please **DO NOT** modify this file directly. We have provided a new way to `contribute to
this repo `_.
`Join `_
the `slack community `_ for an instant touch of HQ data updates.

.. |OK_ICON| image:: https://raw.githubusercontent.com/awesomedata/apd-core/master/deploy/ok-24.png
.. |FIXME_ICON| image:: https://raw.githubusercontent.com/awesomedata/apd-core/master/deploy/fixme-24.png

* |OK_ICON| I am well.
* |FIXME_ICON| Please fix me.

.. contents:: **Table of Contents**


Agriculture
-----------

* |OK_ICON| `The global dataset of historical yields for major crops 1981–2016 - The Global Dataset of [...] `_ [`Meta `_]

* |OK_ICON| `Hyperspectral benchmark dataset on soil moisture - This dataset was measured in a five-day [...] `_ [`Meta `_]

* |OK_ICON| `Lemons quality control dataset - Lemon dataset has been prepared to investigate the [...] `_ [`Meta `_]

* |OK_ICON| `Optimized Soil Adjusted Vegetation Index - The IDB is a tool for working with remote sensing [...] `_ [`Meta `_]

* |FIXME_ICON| `U.S. Department of Agriculture's Nutrient Database `_ [`Meta `_]

* |OK_ICON| `U.S. Department of Agriculture's PLANTS Database - The Complete PLANTS Checklist is nearly 7 [...] `_ [`Meta `_]

Architecture
------------

* |OK_ICON| `Swiss Apartment Models - This dataset contains detailed data on 42,207 apartments (242,257 [...] `_ [`Meta `_]

Biology
-------

* |OK_ICON| `1000 Genomes - The 1000 Genomes Project ran between 2008 and 2015, creating the largest [...] `_ [`Meta `_]

* |OK_ICON| `ANHIR - Automatic Non-rigid Histological Image Registration (ANHIR) consists of 2D [...] `_ [`Meta `_]

* |OK_ICON| `American Gut (Microbiome Project) - The American Gut project is the largest crowdsourced [...] `_ [`Meta `_]

* |OK_ICON| `BCNB - There are WSIs of 1058 patients, part of tumor regions are annotated in WSIs. Except [...] `_ [`Meta `_]

* |OK_ICON| `Broad Bioimage Benchmark Collection (BBBC) - The Broad Bioimage Benchmark Collection (BBBC) [...] `_ [`Meta `_]

* |OK_ICON| `Broad Cancer Cell Line Encyclopedia (CCLE) `_ [`Meta `_]

* |OK_ICON| `CIMA - CIMA dataset includes images of 2D histological microscopy tissue slices. `_ [`Meta `_]

* |OK_ICON| `Cell Image Library - This library is a public and easily accessible resource database of [...] `_ [`Meta `_]

* |FIXME_ICON| `Complete Genomics Public Data - A diverse data set of whole human genomes are freely [...] `_ [`Meta `_]

* |OK_ICON| `CytoImageNet - A large-scale dataset of microscopy images. Contains 890,737 total grayscale [...] `_ [`Meta `_]

* |OK_ICON| `EBI ArrayExpress - ArrayExpress Archive of Functional Genomics Data stores data from high- [...] `_ [`Meta `_]

* |OK_ICON| `EBI Protein Data Bank in Europe - The Electron Microscopy Data Bank (EMDB) is a public [...] `_ [`Meta `_]

* |OK_ICON| `ENCODE project - The Encyclopedia of DNA Elements (ENCODE) Consortium is an ongoing [...] `_ [`Meta `_]

* |OK_ICON| `Electron Microscopy Pilot Image Archive (EMPIAR) - EMPIAR, the Electron Microscopy Public [...] `_ [`Meta `_]

* |OK_ICON| `Ensembl Genomes `_ [`Meta `_]

* |OK_ICON| `Gene Expression Omnibus (GEO) - GEO is a public functional genomics data repository [...] `_ [`Meta `_]

* |OK_ICON| `Gene Ontology (GO) - GO annotation files `_ [`Meta `_]

* |OK_ICON| `Global Biotic Interactions (GloBI) `_ [`Meta `_]

* |OK_ICON| `Harvard Medical School (HMS) LINCS Project - The Harvard Medical School (HMS) LINCS Center is [...] `_ [`Meta `_]

* |FIXME_ICON| `Human Genome Diversity Project - A group of scientists at Stanford University have [...] `_ [`Meta `_]

* |OK_ICON| `Human Microbiome Project (HMP) - The HMP sequenced over 2000 reference genomes isolated from [...] `_ [`Meta `_]

* |OK_ICON| `ICOS PSP Benchmark - The ICOS PSP benchmarks repository contains an adjustable real-world [...] `_ [`Meta `_]

* |OK_ICON| `International HapMap Project `_ [`Meta `_]

* |FIXME_ICON| `Journal of Cell Biology DataViewer `_ [`Meta `_]

* |OK_ICON| `KEGG - KEGG is a database resource for understanding high-level functions and utilities of [...] `_ [`Meta `_]

* |OK_ICON| `NCBI Proteins `_ [`Meta `_]

* |OK_ICON| `NCBI Taxonomy - The NCBI Taxonomy database is a curated set of names and classifications for [...] `_ [`Meta `_]

* |OK_ICON| `NCI Genomic Data Commons - The GDC Data Portal is a robust data-driven platform that allows [...] `_ [`Meta `_]

* |OK_ICON| `NIH Microarray data `_ [`Meta `_]

* |OK_ICON| `OpenSNP genotypes data - openSNP allows customers of direct-to-customer genetic tests to [...] `_ [`Meta `_]

* |OK_ICON| `Palmer Penguins - The goal of palmerpenguins is to provide a great dataset for data [...] `_ [`Meta `_]

* |OK_ICON| `Pathguid - Protein-Protein Interactions Catalog `_ [`Meta `_]

* |OK_ICON| `Protein Data Bank - This resource is powered by the Protein Data Bank archive-information [...] `_ [`Meta `_]

* |OK_ICON| `Psychiatric Genomics Consortium - The purpose of the Psychiatric Genomics Consortium (PGC) is [...] `_ [`Meta `_]

* |OK_ICON| `PubChem Project - PubChem is the world's largest collection of freely accessible chemical [...] `_ [`Meta `_]

* |OK_ICON| `PubGene (now Coremine Medical) - COREMINE™ is a family of tools developed by the Norwegian [...] `_ [`Meta `_]

* |OK_ICON| `Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) - COSMIC, the Catalogue Of Somatic [...] `_ [`Meta `_]

* |OK_ICON| `Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) `_ [`Meta `_]

* |OK_ICON| `Sequence Read Archive(SRA) - The Sequence Read Archive (SRA) stores raw sequence data from [...] `_ [`Meta `_]

* |OK_ICON| `Serratus - Analysis of 7.1 million RNA/DNA sequencing datasets to discover the total [...] `_ [`Meta `_]

* |OK_ICON| `Stanford Microarray Data (Retired NOW) `_ [`Meta `_]

* |OK_ICON| `Stowers Institute Original Data Repository `_ [`Meta `_]

* |OK_ICON| `Systems Science of Biological Dynamics (SSBD) Database - Systems Science of Biological [...] `_ [`Meta `_]

* |OK_ICON| `The Cancer Genome Atlas (TCGA), available via Broad GDAC `_ [`Meta `_]

* |OK_ICON| `The Catalogue of Life - The Catalogue of Life is a quality-assured checklist of more than 1.8 [...] `_ [`Meta `_]

* |OK_ICON| `The Personal Genome Project - The Personal Genome Project, initiated in 2005, is a vision and [...] `_ [`Meta `_]

* |OK_ICON| `UCSC Public Data `_ [`Meta `_]

* |OK_ICON| `UniGene `_ [`Meta `_]

* |OK_ICON| `Universal Protein Resource (UnitProt) - The Universal Protein Resource (UniProt) is a [...] `_ [`Meta `_]

* |OK_ICON| `Rfam - The Rfam database is a collection of RNA families, each represented by multiple [...] `_ [`Meta `_]

Chemistry
---------

* |OK_ICON| `Ionic Liquids Database - ILThermo `_ [`Meta `_]

Climate+Weather
---------------

* |FIXME_ICON| `Actuaries Climate Index `_ [`Meta `_]

* |FIXME_ICON| `Australian Weather `_ [`Meta `_]

* |FIXME_ICON| `Aviation Weather Center - Consistent, timely and accurate weather information for the world [...] `_ [`Meta `_]

* |FIXME_ICON| `Brazilian Weather - Historical data (In Portuguese) - Data related to climate and weather [...] `_ [`Meta `_]

* |FIXME_ICON| `Canadian Meteorological Centre `_ [`Meta `_]

* |OK_ICON| `Caravan - a dataset for large-sample hydrology - Caravan is an open community dataset of [...] `_ [`Meta `_]

* |OK_ICON| `Climate Data from UEA (updated monthly) `_ [`Meta `_]

* |OK_ICON| `Dutch Weather - The KNMI Data Center (KDC) portal provides access to KNMI data on weather, [...] `_ [`Meta `_]

* |OK_ICON| `European Climate Assessment & Dataset `_ [`Meta `_]

* |OK_ICON| `German Climate Data Center `_ [`Meta `_]

* |OK_ICON| `Global Climate Data Since 1929 `_ [`Meta `_]

* |OK_ICON| `Charting The Global Climate Change News Narrative 2009-2020 - These four datasets represent [...] `_ [`Meta `_]

* |OK_ICON| `NASA Global Imagery Browse Services `_ [`Meta `_]

* |OK_ICON| `NOAA Bering Sea Climate `_ [`Meta `_]

* |OK_ICON| `NOAA Climate Datasets `_ [`Meta `_]

* |FIXME_ICON| `NOAA Realtime Weather Models `_ [`Meta `_]

* |FIXME_ICON| `NOAA SURFRAD Meteorology and Radiation Datasets `_ [`Meta `_]

* |OK_ICON| `Open-Meteo - Open-Source Weather API - Open-source weather API with free access for non- [...] `_ [`Meta `_]

* |OK_ICON| `The World Bank Open Data Resources for Climate Change `_ [`Meta `_]

* |OK_ICON| `UEA Climatic Research Unit `_ [`Meta `_]

* |OK_ICON| `WU Historical Weather Worldwide `_ [`Meta `_]

* |OK_ICON| `Wahington Post Climate Change - To analyze warming temperatures in the United States, The [...] `_ [`Meta `_]

* |OK_ICON| `WorldClim - Global Climate Data `_ [`Meta `_]

ComplexNetworks
---------------

* |OK_ICON| `AMiner Citation Network Dataset `_ [`Meta `_]

* |OK_ICON| `CrossRef DOI URLs `_ [`Meta `_]

* |OK_ICON| `DBLP Citation dataset `_ [`Meta `_]

* |OK_ICON| `DIMACS Road Networks Collection `_ [`Meta `_]

* |OK_ICON| `NBER Patent Citations `_ [`Meta `_]

* |OK_ICON| `NIST complex networks data collection `_ [`Meta `_]

* |FIXME_ICON| `Network Repository with Interactive Exploratory Analysis Tools `_ [`Meta `_]

* |OK_ICON| `Protein-protein interaction network `_ [`Meta `_]

* |OK_ICON| `PyPI and Maven Dependency Network `_ [`Meta `_]

* |OK_ICON| `Scopus Citation Database `_ [`Meta `_]

* |OK_ICON| `Small Network Data `_ [`Meta `_]

* |OK_ICON| `Stanford GraphBase `_ [`Meta `_]

* |OK_ICON| `Stanford Large Network Dataset Collection `_ [`Meta `_]

* |FIXME_ICON| `Stanford Longitudinal Network Data Sources `_ [`Meta `_]

* |FIXME_ICON| `The Koblenz Network Collection `_ [`Meta `_]

* |OK_ICON| `The Laboratory for Web Algorithmics (UNIMI) `_ [`Meta `_]

* |OK_ICON| `UCI Network Data Repository `_ [`Meta `_]

* |OK_ICON| `UFL sparse matrix collection `_ [`Meta `_]

* |FIXME_ICON| `WSU Graph Database `_ [`Meta `_]

* |FIXME_ICON| `Community Resource for Archiving Wireless Data At Dartmouth - Contains datasets of pcap files [...] `_ [`Meta `_]

ComputerNetworks
----------------

* |OK_ICON| `3.5B Web Pages from CommonCrawl 2012 `_ [`Meta `_]

* |FIXME_ICON| `53.5B Web clicks of 100K users in Indiana Univ. `_ [`Meta `_]

* |OK_ICON| `CAIDA Internet Datasets `_ [`Meta `_]

* |FIXME_ICON| `CRAWDAD Wireless datasets from Dartmouth Univ. `_ [`Meta `_]

* |OK_ICON| `ClueWeb09 - 1B web pages `_ [`Meta `_]

* |OK_ICON| `ClueWeb12 - 733M web pages `_ [`Meta `_]

* |OK_ICON| `CommonCrawl Web Data over 7 years `_ [`Meta `_]

* |OK_ICON| `Shopper Intent Prediction from Clickstream E‑Commerce Data with Minimal Browsing Information `_ [`Meta `_]

* |OK_ICON| `Criteo click-through data `_ [`Meta `_]

* |OK_ICON| `Internet-Wide Scan Data Repository `_ [`Meta `_]

* |OK_ICON| `MIRAGE-2019 - MIRAGE-2019 is a human-generated dataset for mobile traffic analysis with [...] `_ [`Meta `_]

* |OK_ICON| `OONI: Open Observatory of Network Interference - Internet censorship data `_ [`Meta `_]

* |OK_ICON| `Open Mobile Data by MobiPerf `_ [`Meta `_]

* |OK_ICON| `The Peer-to-Peer Trace Archive - Real-world measurements play a key role in studying the [...] `_ [`Meta `_]

* |OK_ICON| `Rapid7 Sonar Internet Scans `_ [`Meta `_]

* |OK_ICON| `UCSD Network Telescope, IPv4 /8 net `_ [`Meta `_]

CyberSecurity
-------------

* |OK_ICON| `CCCS-CIC-AndMal-2020 - The dataset includes 200K benign and 200K malware samples totalling to [...] `_ [`Meta `_]

* |OK_ICON| `Traffic and Log Data Captured During a Cyber Defense Exercise - This dataset was acquired [...] `_ [`Meta `_]

DataChallenges
--------------

* |OK_ICON| `AIcrowd Competitions `_ [`Meta `_]

* |OK_ICON| `Bruteforce Database `_ [`Meta `_]

* |OK_ICON| `Challenges in Machine Learning `_ [`Meta `_]

* |FIXME_ICON| `CrowdANALYTIX dataX `_ [`Meta `_]

* |FIXME_ICON| `D4D Challenge of Orange `_ [`Meta `_]

* |OK_ICON| `DrivenData Competitions for Social Good `_ [`Meta `_]

* |OK_ICON| `ICWSM Data Challenge (since 2009) `_ [`Meta `_]

* |OK_ICON| `KDD Cup by Tencent 2012 `_ [`Meta `_]

* |OK_ICON| `Kaggle Competition Data `_ [`Meta `_]

* |OK_ICON| `Localytics Data Visualization Challenge `_ [`Meta `_]

* |OK_ICON| `Netflix Prize `_ [`Meta `_]

* |OK_ICON| `Space Apps Challenge `_ [`Meta `_]

* |FIXME_ICON| `Telecom Italia Big Data Challenge `_ [`Meta `_]

* |FIXME_ICON| `TravisTorrent Dataset - MSR'2017 Mining Challenge `_ [`Meta `_]

* |FIXME_ICON| `TunedIT - Data mining & machine learning data sets, algorithms, challenges `_ [`Meta `_]

* |OK_ICON| `Yelp Dataset Challenge - The Yelp dataset is a subset of our businesses, reviews, and user [...] `_ [`Meta `_]

EarthScience
------------

* |OK_ICON| `38-Cloud (Cloud Detection) - Contains 38 Landsat 8 scene images and their manually extracted [...] `_ [`Meta `_]

* |OK_ICON| `AQUASTAT - Global water resources and uses `_ [`Meta `_]

* |OK_ICON| `BODC - marine data of ~22K vars `_ [`Meta `_]

* |OK_ICON| `EOSDIS - NASA's earth observing system data `_ [`Meta `_]

* |OK_ICON| `Earth Models `_ [`Meta `_]

* |OK_ICON| `Global Wind Atlas - The Global Wind Atlas is a free, web-based application developed to help [...] `_ [`Meta `_]

* |OK_ICON| `Integrated Marine Observing System (IMOS) - roughly 30TB of ocean measurements `_ [`Meta `_]

* |FIXME_ICON| `Marinexplore - Open Oceanographic Data `_ [`Meta `_]

* |FIXME_ICON| `Alabama Real-Time Coastal Observing System `_ [`Meta `_]

* |OK_ICON| `National Estuarine Research Reserves System-Wide Monitoring Program - long-term estuarine [...] `_ [`Meta `_]

* |OK_ICON| `Oil and Gas Authority Open Data - The dataset covers 12,500 offshore wellbores, 5,000 seismic [...] `_ [`Meta `_]

* |OK_ICON| `Smithsonian Institution Global Volcano and Eruption Database `_ [`Meta `_]

* |OK_ICON| `USGS Earthquake Archives `_ [`Meta `_]

* |OK_ICON| `Wellhead Protection Area (protection zone) prediction using breakthrough curves - This [...] `_ [`Meta `_]

Economics
---------

* |OK_ICON| `Asian Productivity Organization (APO) - The AEPM provides a graphic dashboard view of [...] `_ [`Meta `_]

* |OK_ICON| `ASEAN Stats - The ASEANstatsDataPortal was first launched in June 2018. The Portal is [...] `_ [`Meta `_]

* |OK_ICON| `American Economic Association (AEA) `_ [`Meta `_]

* |OK_ICON| `Asian KLEMS - Asia KLEMS is an Asian regional research consortium to promote building [...] `_ [`Meta `_]

* |OK_ICON| `Harvard Atlas of Economic Complexity - A database for people to explore global trade flows [...] `_ [`Meta `_]

* |OK_ICON| `BIS Financial Database - The files contain the same data as in the BIS Statistics Explorer [...] `_ [`Meta `_]

* |OK_ICON| `Barro-Lee Education Attainment - Barro-Lee Educational Attainment Data from 1950 to 2010. [...] `_ [`Meta `_]

* |OK_ICON| `CEPII Database - A database of the world economy, through its country and region profiles, in [...] `_ [`Meta `_]

* |OK_ICON| `EUKLEMS - EU KLEMS is an industry level, growth and productivity research project. EU KLEMS [...] `_ [`Meta `_]

* |OK_ICON| `Economic Freedom of the World Data