https://github.com/apiad/datasets-list
A list of datasets for machine learning related tasks
https://github.com/apiad/datasets-list
Last synced: about 1 month ago
JSON representation
A list of datasets for machine learning related tasks
- Host: GitHub
- URL: https://github.com/apiad/datasets-list
- Owner: apiad
- Created: 2016-12-01T12:18:43.000Z (over 9 years ago)
- Default Branch: master
- Last Pushed: 2016-12-01T12:23:15.000Z (over 9 years ago)
- Last Synced: 2025-10-25T02:43:51.869Z (5 months ago)
- Size: 12.7 KB
- Stars: 19
- Watchers: 1
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# List of Machine Learning Datasets
The following is a list of publicly availables datasets for various machine learning tasks. Reviews, fixes, dead links and updates are appreciated.
Please provide due credit by adding below in the `Acknowledgments` section the links to the corresponding sources.
## Data Journals
* [Data-artikelen | Sargasso](http://sargasso.nl/soort/data/)
* [Data journalism and data visualization from the Datablog | News | The Guardian](http://www.guardian.co.uk/news/datablog)
## Data Marketplaces and Data Hubs
* [Knoema – Home](http://knoema.com/)
* [Public Data Sets : Amazon Web Services](http://aws.amazon.com/datasets)
* [Socrata](https://opendata.socrata.com/)
* [Data Publica | Les données pour votre business](http://www.data-publica.com/)
* [Archive-It – Web Archiving Services for Libraries and Archives](http://www.archive-it.org/)
* [Freebase](http://www.freebase.com/)
* [Google Public Data Explorer](http://www.google.com/publicdata/directory)
* [Welcome – the Data Hub](http://datahub.io/)
* [Data Sets | AggData](http://www.aggdata.com/data)
* [Find & Purchase Data Subscriptions | Windows Azure Marketplace](http://datamarket.azure.com/browse/data)
* [Factual | Home](http://factual.com/)
## Data Search Engines
* [Zanran Numerical Data Search](http://www.zanran.com/q/)
* [Quandl – Intelligent Search for Numerical Data](http://www.quandl.com/)
## International Bodies & Agencies
* [IMF Data and Statistics](http://www.imf.org/external/data.htm)
* [Data | The World Bank](http://data.worldbank.org/)
* [OECD.Stat](http://stats.oecd.org/)
* [UNdata](http://data.un.org/)
* [Data and maps — European Environment Agency (EEA)](http://www.eea.europa.eu/data-and-maps)
* [Eurostat Home](http://epp.eurostat.ec.europa.eu/portal/page/portal/eurostat/home)
## Local Governments
* [Inicio Misiones](http://www.datos.misiones.gov.ar/)
* [Open Government Data Wien (OGD)](http://www.data.wien.gv.at/)
* [Open data – City of Brussels](http://www.brussels.be/artdet.cfm?id=7191)
* [Open Data – Brisbane City Council](http://www.brisbane.qld.gov.au/about-council/governance-strategy/economic-development/open-data/index.htm)
* [Open data – Salford City Council](http://www.salford.gov.uk/opendata.htm)
* [Sunderland City Council : Local Public Data](http://www.sunderland.gov.uk/index.aspx?articleid=4112)
* [Welcome to the London Datastore | London DataStore](http://data.london.gov.uk/)
* [Leeds City Council – Open Data](http://opendata.leeds.gov.uk/)
* [Home – DataGM – Data Greater Manchester](http://datagm.org.uk/)
* [Open Data | Derby City Council](http://www.derby.gov.uk/council-and-democracy/open-data-and-freedom-of-information/open-data/)
* [Council data – Brighton & Hove City Council](http://www.brighton-hove.gov.uk/index.cfm?request=b1160744)
* [Open Data – Birmingham City Council](http://www.birmingham.gov.uk/open-data)
* [Aberdeen City Council Open Data](http://www.aberdeencity.gov.uk/open_data/open_data_home.asp)
* [Open Data – City of Waterloo](http://www.waterloo.ca/en/opendata/index.asp)
* [Open Data catalogue | City of Vancouver](http://vancouver.ca/your-government/open-data-catalogue.aspx)
* [Open Data Home – Open Data – Home | City of Toronto](http://www1.toronto.ca/wps/portal/open_data/open_data_home?vgnextoid=b3886aa8cc819210VgnVCM10000067d60f89RCRD)
* [City of Prince George – Open Data Catalogue](http://princegeorge.ca/cityservices/online/odc/Pages/default.aspx)
* [Open Data Ottawa | City of Ottawa](http://ottawa.ca/en/open-data-ottawa)
* [Open Data Catalogue – City of Red Deer](http://data.reddeer.ca/)
* [Open Data | City of Niagara Falls, Canada](http://www.niagarafalls.ca/services/open/default.aspx)
* [Open Data Catalogue | City of Nanaimo](http://data.nanaimo.ca/)
* [Mississauga.ca – Residents – Publications and Open Data Catalogue](http://www.mississauga.ca/portal/residents/publicationsopendatacatalogue)
* [City of Medicine Hat Open Data Catalogue](http://data.medicinehat.ca/)
* [Kamloops open data](http://www.kamloops.ca/downloads/maps/launch.htm)
* [Open Data Catalogue Kelowna](http://www.kelowna.ca/CM/Page3936.aspx)
* [City of Hamilton – Open Data](http://www.hamilton.ca/ProjectsInitiatives/OpenData/)
* [City of Fredericton – Open Data Home](http://www.fredericton.ca/en/citygovernment/TermsOfUse.asp)
* [City of Edmonton Open Data Catalogue](https://data.edmonton.ca/)
* [City of Somerville, MA](https://data.somervillema.gov/)
* [Data.Seattle.Gov | Seattle’s Data Site](https://data.seattle.gov/)
* [City of Scottsdale](http://data.scottsdaleaz.gov/)
* [Welcome – Santa Cruz Open Data](http://data.cityofsantacruz.com/)
* [Data | San Francisco](https://data.sfgov.org/)
* [Open Raleigh – The Official City of Raleigh Portal](http://www.raleighnc.gov/open)
* [Datasets | CivicApps.org Portland OR](http://civicapps.org/datasets)
* [OpenDataPhilly – Connecting People With Data](http://www.opendataphilly.org/)
* [NYC Open Data](https://nycopendata.socrata.com/)
* [Greater New Orleans Community Data Center](http://www.gnocdc.org/)
* [City of Madison | Open Data](https://data.cityofmadison.com/)
* [City and County of Honolulu](https://data.honolulu.gov/)
* [US/Data Catalog District of Columbia](http://data.dc.gov/)
* [Denver Open Data Catalog](http://data.denvergov.org/)
* [data.cookcountyil.gov | The Cook County Government Open Data Website](http://data.cookcountyil.gov/)
* [City of Chicago | Data Portal](https://data.cityofchicago.org/)
* [Open Government | City of Boston](http://www.cityofboston.gov/open/)
* [OpenBaltimore / City of Baltimore’s Open Data Catalog](https://data.baltimorecity.gov/)
* [Data.AustinTexas.gov | Open Austin](https://data.austintexas.gov/)
* [OpenDataAsheville – Connecting People With Data](http://opendatacatalog.ashevillenc.gov/)
* [US/Arvada](http://arvada.org/opendata)
* [GovHK: About Data.One](http://www.gov.hk/en/theme/psi/welcome/)
* [data.gov.sg Singapore](http://data.gov.sg/)
## Machine Learning Challenges
* [ACM KDD CUP](http://www.sigkdd.org/kddcup/index.php)
* [Competitions – Kaggle](http://www.kaggle.com/competitions)
* [Data – Repository – Causality Workbench](http://www.causality.inf.ethz.ch/repository.php)
* [TunedIT – Data mining & machine learning data sets, algorithms, challenges](http://tunedit.org/)
## Machine Learning Datasets
* [mldata :: Welcome](http://mldata.org/)
* [UCI Machine Learning Repository: Data Sets](http://archive.ics.uci.edu/ml/datasets.html)
## Miscellaneous Data Sources
* [IHME | Institute for Health Metrics and Evaluation](http://www.healthmetricsandevaluation.org/)
* [Gapminder: Unveiling the beauty of statistics for a fact based world view.](http://www.gapminder.org/)
* [Doing Research in New York City Public Schools and Requesting Data – NYC Data – New York City Department of Education](http://schools.nyc.gov/Accountability/data/default.htm)
* [RITA | BTS | Title from h2](http://www.transtats.bts.gov/ot_delay/ot_delaycause1.asp)
* [Oregon Climate Data](http://www.ocs.orst.edu/oregon-climate-data)
* [Quantnet :: Start](http://sfb649.wiwi.hu-berlin.de/quantnet/index.php?p=start)
* [Data Tools – Locators](http://nces.ed.gov/datatools/index.asp)
* [My Data | Measured Me](http://measuredme.com/mydata-html/)
* [Webscope from Yahoo! Labs](http://webscope.sandbox.yahoo.com/catalog.php)
* [SoourceForge.net Research Data](http://www3.nd.edu/~oss/Data/data.html)
* [Online Data – Robert Shiller](http://www.econ.yale.edu/~shiller/data.htm)
* [Obtaining Data From the NSSDC](http://nssdc.gsfc.nasa.gov/nssdc/obtaining_data.html)
* [Cancer Program Data Sets](http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi)
* [Million Song Dataset | scaling MIR research](http://labrosa.ee.columbia.edu/millionsong/)
* [Google Ngram Viewer](http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)
* [Data | GeoDa Center](https://geodacenter.asu.edu/datalist/)
* [Home – GEO DataSets – NCBI](http://www.ncbi.nlm.nih.gov/gds/)
* [The Financial Data Finder A – G](http://fisher.osu.edu/fin/fdf/osudata.htm)
* [Frequent Itemset Mining Dataset Repository](http://fimi.ua.ac.be/data/)
* [Europeana Professional – Linked Open Data](http://pro.europeana.eu/linked-open-data)
* [Inforum – EconData](http://inforumweb.umd.edu/econdata/econdata.html)
* [Summary of Data Sets by Application Area](http://kdd.ics.uci.edu/summary.data.application.html)
* [Data Sets | Pew Research Center’s Internet & American Life Project](http://pewinternet.org/Data-Tools/Download-Data/Data-Sets.aspx)
* [Cosm – Explore](https://cosm.com/explore)
* [Advanced NFL Stats: Play-by-Play Data](http://www.advancednflstats.com/2010/04/play-by-play-data.html)
## National Governments and States
* [Portal de Obligaciones de Transparencia](http://portaltransparencia.gob.mx/pot/openData/openData.jsp)
* [Junta de Andalucía – Datos abiertos](http://www.juntadeandalucia.es/datosabiertos/portal.html)
* [Reutilización de la Información del Sector Público | Reutilización de la Información de los Servicios Públicos](http://www.datosabiertos.jcyl.es/)
* [Portal de Datos Abiertos de JCCM](http://opendata.jccm.es/)
* [Ayuntamiento de Zaragoza. Datos de Zaragoza Reutilización](http://www.zaragoza.es/ciudad/risp/)
* [Dades obertes Lleida – Ajuntament de Lleida](http://cartolleida.paeria.es/lleidaoberta/)
* [ISTAC | El ISTAC](http://www.gobiernodecanarias.org/istac/istac/)
* [Dades Obertes. Generalitat de Catalunya](http://www20.gencat.cat/portal/site/dadesobertes)
* [Dades Obertes CAIB](http://www.caib.es/caibdatafront/)
* [Reutilización de la Información del Sector Público en Gijón](http://datos.gijon.es/)
* [Open Data Euskadi ataria, Eusko Jaurlaritzaren datu publikoen irekitzea](http://opendata.euskadi.net/w79-home/eu/)
* [Data for Hawaii | data.hawaii.gov](https://data.hawaii.gov/)
* [Florida Has A Right To Know](http://www.floridahasarighttoknow.com/)
* [Open.Georgia.gov](http://www.open.georgia.gov/)
* [Commonwealth Data Point](http://datapoint.apa.virginia.gov/)
* [Open Data | data.maryland.gov](https://data.maryland.gov/)
* [Connecticut Transparency Website](http://transparency.ct.gov/html/main.asp)
* [RI.gov: Open Data](http://www.ri.gov/data/)
* [NYS Data Center](http://esd.ny.gov/NYSDataCenter.html)
* [Maine.gov DataShare](http://www.maine.gov/cgi-bin/data/index.pl)
* [State of Alabama – Open.alabama.gov](http://open.alabama.gov/)
* [Open Government for the State of Tennessee](http://www.tn.gov/opengov/)
* [Ohio.gov | Government | State Facts and History](http://ohio.gov/government/factsandhistory/)
* [OpenDoor – Kentucky](http://opendoor.ky.gov/Pages/default.aspx)
* [Data.Illinois.gov | Open Illinois](https://data.illinois.gov/)
* [SOM – Michigan Data Store](http://www.michigan.gov/som/0,1607,7-192-29938_54272---,00.html)
* [Louisiana Transparency and Accountability Portal](http://wwwprd.doa.louisiana.gov/LaTrac/portal.cfm)
* [data.mo.gov | State of Missouri Data Portal](https://data.mo.gov/)
* [DATAshare | data.iowa.gov](http://data.iowa.gov/)
* [Minnesota open data // your portal for Minnesota data transparency](http://www.state.mn.us/opendata/)
* [Open Data Texas](http://www.texas.gov/en/Connect/Pages/open-data.aspx)
* [Welcome to Oklahoma’s Official Web Site](http://www.ok.gov/about/data.html)
* [KanView: Kansas Transparency Taxpayer Act – Kansas Revenues and Expenditures Search](http://www.kansas.gov/KanView/)
* [OPEN SD :: South Dakota Government Information](http://open.sd.gov/)
* [North Dakota GIS (Geographic Information Systems)](http://www.nd.gov/gis/)
* [State Government Data New Mexico](http://www.sunshineportalnm.com/)
* [Colorado.gov: The Official State Web Portal](http://www.colorado.gov/data/)
* [Arizona OpenBooks | – Arizona Transparency Finances in Detail](http://openbooks.az.gov/app/transparency/index.html)
* [Utah Data – Utah.gov](http://www.utah.gov/data/)
* [Data.CA.gov | Data Transparency for the State of California](http://data.ca.gov/)
* [Oregon Data | Opening Oregon’s Data](https://data.oregon.gov/)
* [Data.Washington | Washington State’s Data Site](https://data.wa.gov/)
* [Home | Data.gov](http://www.data.gov/)
* [Portal de Datos Públicos – Inicio](http://datos.gob.cl/)
* [datos.gub.uy | Portal del Estado Uruguayo](http://datos.gub.uy/)
* [Bem vindo – Portal Brasileiro de Dados Abertos](http://dados.gov.br/)
* [Directorio de Empresas, Marcas registradas, Normas legales y Teléfonos en Perú](http://www.datosperu.org/)
* [StatCentral.ie – The Portal to Ireland’s Official Statistics](http://www.statcentral.ie/)
* [data.gov.be | The Belgian open data initiative](http://data.gov.be/)
* [Data.overheid.nl: het open dataportaal van de Nederlandse overheid](https://data.overheid.nl/)
* [PortalU – German Environmental Information Portal](http://www.portalu.de/portal/default-page.psml)
* [Statistical database](http://pub.stat.ee/px-web.2001/Dialog/statfile1.asp)
* [Date.gov.md | Portalul datelor guvernamentale deschise al Republicii Moldova](http://data.gov.md/)
* [Offene Daten Österreich | data.gv.at](http://www.data.gv.at/)
* [Vitajte – data.gov.sk](http://data.gov.sk/)
* [dati.gov.it | I dati aperti della PA](http://www.dati.gov.it/)
* [Δημοσια, Ανοικτά Δεδομένα](http://geodata.gov.gr/geodata/)
* [Open Kenya | Transparent Africa](https://opendata.go.ke/)
* [SAUDI | National e-Government Portal – Home](http://www.saudi.gov.sa/wps/portal/yesserRoot/home/!ut/p/b1/04_Sj9CPykssy0xPLMnMz0vMAfGjzOId3Z2dgj1NjAz8zUMMDTxNzZ2NHU0NDd29DfWDU_P0_Tzyc1P1C7IdFQFV9YhO/dl4/d5/L2dBISEvZ0FBIS9nQSEh/)
* [data.govt.nz – New Zealand government data online » Data.govt.nz](http://data.govt.nz/)
* [data.gov.au](http://data.gov.au/)
* [국가공유자원포털](https://www.data.go.kr/main.jsp)
* [中国政府公开信息整合服务平台](http://govinfo.nlc.gov.cn/)
* [Open Data Canada](http://www.data.gc.ca/)
* [OpenGovData.ru](http://opengovdata.ru/)
* [OpenAid – Start](http://openaid.se/)
* [data.norge.no | Åpne offentlige data i Norge – Difi](http://data.norge.no/)
* [Portada | datos.gob.es](http://datos.gob.es/datos/?language=in)
* [Open Data Colombia](http://www.datos.gov.co/)
* [home | data.gov.uk](http://data.gov.uk/)
## Open Companies Data Sources
* [Yelp’s Academic Dataset | Yelp](http://www.yelp.com/academic_dataset)
* [Data Export – Prosper](http://www.prosper.com/tools/DataExport.aspx)
* [Lending Club Statistics – Lending Club](https://www.lendingclub.com/info/download-data.action)
## U.S. Agencies Data Sources
* [Federal Agency Participation | Data.gov](http://www.data.gov/metric)
* [services.sunlightlabs.com](http://services.sunlightlabs.com/)
* [FRB: Data Download Program (DDP)](http://www.federalreserve.gov/feeds/datadownload.html)
## Various Lists of Data Sources
* [Programming Challenges: What are some good “toy problems” in data science? – Quora](http://www.quora.com/Programming-Challenges-1/What-are-some-good-toy-problems-in-data-science)
* [Data: Where can I find large datasets open to the public? – Quora](http://www.quora.com/Data/Where-can-I-find-large-datasets-open-to-the-public)
* [Data Analysis: What’s your favorite free data source? – Quora](http://www.quora.com/Data-Analysis/Whats-your-favorite-free-data-source)
* [What are some publicly available market data feeds? – Quora](http://www.quora.com/What-are-some-publicly-available-market-data-feeds)
* [Is there a reliable free source for per country LinkedIn statistics? – Quora](http://www.quora.com/Is-there-a-reliable-free-source-for-per-country-LinkedIn-statistics)
* [@pskomoroch #dataset – Delicious](https://delicious.com/pskomoroch/dataset)
* [Free, Public Data Sets | Hacker News](http://news.ycombinator.com/item?id=2165497)
* [List of European Open Data Catalogues at lod2.okfn.org](http://lod2.okfn.org/eu-data-catalogues/)
* [Open Data](http://www.reddit.com/r/opendata/)
* [Datasets Archive](http://www.reddit.com/r/datasets/)
* [Some Datasets Available on the Web » Data Wrangling Blog](http://www.datawrangling.com/some-datasets-available-on-the-web)
## Research Quality Datasets by Hilary Mason
* [Lending Club Loan Data](http://www.lendingclub.com/info/download-data.action)
* [SMS Spam Collection](http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/)
* [Flickr personal taxonomies](http://www.isi.edu/~lerman/downloads/flickr/flickr_taxonomies.html)
* [Yahoo Data for Researchers](http://webscope.sandbox.yahoo.com/index.php)
* [ICWSM Spinnr Challenge 2011 dataset](http://www.icwsm.org/2011/data.php)
* [Quantum Chaotic Thoughts: Facebook100 Data Set](http://masonporter.blogspot.com/2011/02/facebook100-data-set.html)
* [Public Data Sets on Amazon Web Services (AWS)](http://aws.amazon.com/publicdatasets/)
* [The ClueWeb09 Dataset](http://http://lemurproject.org/clueweb09/)
* [Census Bureau Home Page](http://www.census.gov/)
* [Data | The World Bank](http://data.worldbank.org/)
* [ImageNet](http://www.image-net.org/)
* [What is Twitter, a Social Network or a News Media? – WWW’10](http://an.kaist.ac.kr/traces/WWW2010.html)
* [dotbot | DotNetDotCom.org](http://https://moz.com/researchtools/ose/dotbot#inde)
* [arXiv.org help – arXiv Bulk Data Access – Amazon S3](http://arxiv.org/help/bulk_data_s3)
* [YouTube Dataset](http://netsg.cs.sfu.ca/youtubedata/)
* [Face Recognition Homepage – Databases](http://www.face-rec.org/databases/)
* [Pajek datasets](http://vlado.fmf.uni-lj.si/pub/networks/data/default.htm)
* [UCI Network Data Repository](http://networkdata.ics.uci.edu/)
* [Datasets for “The Elements of Statistical Learning”](http://www-stat.stanford.edu/~tibs/ElemStatLearn/data.html)
* [Enron Email Dataset](http://www.cs.cmu.edu/~enron/)
* [MovieLens Data Sets | GroupLens Research](http://www.grouplens.org/node/73)
* [Translation Task – EMNLP 2011 Sixth Workshop on Statistical Machine Translation](http://statmt.org/wmt11/translation-task.html#download)
* [Project Gutenberg](http://www.gutenberg.org/wiki/Gutenberg:Offline_Catalogs)
* [About WordNet – WordNet – About WordNet](http://wordnet.princeton.edu/)
* [Aligned Hansards of the 36th Parliament of Canada](http://www.isi.edu/natural-language/download/hansard/)
* [CRCNS – Collaborative Research in Computational Neuroscience – Data sharing](http://crcns.org/data-sets)
* [USENET corpus](http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html)
* [UniGene](http://www.ncbi.nlm.nih.gov/unigene)
* [ChEMBLdb](http://www.ebi.ac.uk/chembl/)
* [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/)
* [Gene Expression Omnibus (GEO) Main page](http://www.ncbi.nlm.nih.gov/geo/)
* [Social Science Data](http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies)
* [IMDB dataset](http://www.imdb.com/interfaces)
* [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data/index.html)
* [Google Books n-gram dataset](http://aws.amazon.com/datasets/8172056142375670)
* [Million Song Dataset | scaling MIR research](http://labrosa.ee.columbia.edu/millionsong/)
* [Belly Button Biodiversity 2.0](http://bbdata.yourwildlife.org/)
* [Sharing PyPi/Maven dependency data « RTFB](http://ogirardot.wordpress.com/2013/01/31/sharing-pypimaven-dependency-data/)
* [Click Dataset | Center for Complex Networks and Systems Research](http://cnets.indiana.edu/groups/nan/webtraffic/click-dataset)
* [The Electric Rice Cooker — One year of deleted weibos archive](http://electricricecooker.tumblr.com/post/42103142042/one-year-of-deleted-weibos-archive)
* [Registered meteorites that has impacted on Earth visualized – AnalyticBridge](http://www.analyticbridge.com/profiles/blogs/registered-meteorites-that-has-impacted-on-earth-visualized)
* [GeoJSON files for real-time Virginia transportation data.](http://gist.github.com/waldoj/5053946)
* [NYPD Crash Data Band-Aid](http://nypd.openscrape.com/#/)
* [11 Billion Clues in 800 Million Documents: A Web Research Corpus Annotated with Freebase Concepts | Research Blog](http://googleresearch.blogspot.com/2013/07/11-billion-clues-in-800-million.html)
* [Big data set – 3.5 billion web pages – made available for all of us – Big Data News](http://www.bigdatanews.com/profiles/blogs/big-data-set-3-5-billion-web-pages-made-available-for-all-of-us)
* [Data.Seattle.Gov | Seattle’s Data Site](http://data.seattle.gov/)
* [New Crawl Data Available! | CommonCrawl](http://commoncrawl.org/new-crawl-data-available/)
* [Detailed data on pass rates, race, and gender for 2013](http://home.cc.gatech.edu/ice-gt/556)
* [Data Download](http://voteview.com/dwnl.html)
# Acknowledgments (online sources)
* [BigML - List of Public Data Sources Fit for Machine Learning](https://blog.bigml.com/list-of-public-data-sources-fit-for-machine-learning/)