Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/datasciencemasters/data

Open Data Sources
https://github.com/datasciencemasters/data

Last synced: 2 months ago
JSON representation

Open Data Sources

Lists

README

        

## Open Data Sources

* _**Availability and access**: the data must be available as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet. The data must also be available in a convenient and modifiable form._
* _**Reuse and redistribution**: the data must be provided under terms that permit reuse and redistribution including the intermixing with other datasets. The data must be machine-readable._
* _**Universal participation**: everyone must be able to use, reuse and redistribute — there should be no discrimination against fields of endeavour or against persons or groups. For example, ‘non-commercial’ restrictions that would prevent ‘commercial’ use, or restrictions of use for certain purposes (e.g. only in education), are not allowed._

-- _Definition by the [Open Knowledge Foundation](https://okfn.org/opendata/)_

### Lists of Data Sets
* [Interesting Data Sets for Statisticians](http://rs.io/100-interesting-data-sets-for-statistics/) - editorialized, entertaining set of open data

### Open Data

* [List of Public Datasets](https://github.com/caesar0301/awesome-public-datasets) - user-curated
* [DBpedia](http://wiki.dbpedia.org/Datasets) - utilizing a large multi-domain ontology
* [Public Data Sets on AWS](https://aws.amazon.com/datasets?_encoding=UTF8&jiveRedirect=1) - common web crawl corpus, NASA satellite imagery, Human Genome, Google Book NGrams, Wikipedia Traffic, Million Song Dataset, Federal Reserve Economic Data, PubChem, more.

### Private Opened Data
* [New York Times](http://data.nytimes.com/) - vocabulary as linked open data; linked vocabulary of people, places, companies, etc.

### Governmental Data

[Compendium of Governmental Open Data Sources](http://datacatalogs.org/)

* [Data.gov (USA)](http://www.data.gov/)
* [Africa Open Data](http://africaopendata.org/dataset)
* [US Census](http://www.census.gov/data/developers/data-sets.html) - Population Estimates and Projections, Nonemployer Statistics and County Business Patterns, Economic Indicators Time Series, more.

### Non-Governmental Org Data

* [The World Bank](http://data.worldbank.org/topic/private-sector) - business regulation measures, company-level data in emerging markets, household consumption patterns, World Development Indicators, World Bank finances
* ^[Pew Research Center's Internet Project](http://www.pewinternet.org/datasets/pages/3/)

### Academic Data

[Inter-university Consortium for Political and Social Research Data Portal](http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/subject.jsp)

* [Surveys of Economic Attitudes and Behavior](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?classification=ICPSR.IV.B.)
* [Continuing Series of Consumer Surveys](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?classification=ICPSR.IV.A.)
* [Historical and Contemporary Economic Processes and Indicators](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies?classification=ICPSR.IV.C.)

### Truly Random Data

* [200,000+ Jeopardy! Questions in a JSON file](http://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file/)
* [10,000 annotated images of cats](http://137.189.35.203/WebUI/CatDatabase/catData.html)

## Open Data Resources

* reddit [r/datasets](http://www.reddit.com/r/datasets/)
* [Open Data - Stack Exchange](http://opendata.stackexchange.com/) (discussion)

^ _license is not truly open, involves some limitations_