An open API service indexing awesome lists of open source software.

https://github.com/secdr/research-database

public database for research
https://github.com/secdr/research-database

Last synced: 4 months ago
JSON representation

public database for research

Awesome Lists containing this project

README

          

# research-database
Focus on collecting different public database for research. If you have any links please contact me or push to the repository.

### Phishing
+ [PhishTank](https://www.phishtank.com/developer_info.php);
+ [OpenPhish](https://www.openphish.com/);
+ [315online](http://www.315online.com.cn/list.php?catid=33);
+ [中国移动垃圾短信](http://www.wid.org.cn/project/2015ccf/comp_detail.php?cid=227);
+ [360最近恶意网站列表](http://webscan.360.cn/url)

### Social data
+ [Reddit Comments Corpus](https://archive.org/details/2015_reddit_comments_corpus);
+ [Full Reddit Submission Corpus](https://www.reddit.com/r/datasets/comments/3mg812/full_reddit_submission_corpus_now_available_2006/);
+ [City Record Online](https://nycopendata.socrata.com/);
+ [TLC Trip Record Data](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml);
+ [Frequency Word Lists](https://invokeit.wordpress.com/frequency-word-lists/);
+ [Amazon product data](http://jmcauley.ucsd.edu/data/amazon/);
+ [Wikimedia database](https://dumps.wikimedia.org/);
+ [Airbnb database](http://insideairbnb.com/get-the-data.html);

### Network data
+ [KDD Cup 1999 Data](http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html);

### Security Data
+ [Driving in the Cloud Dataset](http://malicia-project.com/dataset.html);
+ [Nothink Malware samples](http://www.nothink.org/honeypots/malware-archives/)
+ [SecRepo.com - Samples of Security Related Data](http://www.secrepo.com/) ****
+ [lanl.gov Open Data Sets](http://csr.lanl.gov/data/);
+ [Crime data from the St. Louis Metropolitan Police Departments](https://github.com/kylesykes/stl-crime-data);
+ [Chronology of Data Breaches Security Breaches 2005 - Present](https://www.privacyrights.org/data-breach);
+ [Malware Sample Sources for Researchers](https://zeltser.com/malware-sample-sources/);
+ [Microsoft Malware Classification Challenge (BIG 2015)](https://www.kaggle.com/c/malware-classification/forums);
+ [Android Malware-The Drebin Dataset](http://user.informatik.uni-goettingen.de/~darp/drebin/);

### Others
+ [beijing data](http://www.beijingcitylab.com/data-released-1/)

### [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data)
+ Social networks : online social networks, edges represent interactions between people
+ Networks with ground-truth communities : ground-truth network communities in social and information networks
+ Communication networks : email communication networks with edges representing communication
+ Citation networks : nodes represent papers, edges represent citations
+ Collaboration networks : nodes represent scientists, edges represent collaborations (co-authoring a paper)
+ Web graphs : nodes represent webpages and edges are hyperlinks
+ Amazon networks : nodes represent products and edges link commonly co-purchased products
+ Internet networks : nodes represent computers and edges communication
+ Road networks : nodes represent intersections and edges roads connecting the intersections
+ Autonomous systems : graphs of the internet
+ Signed networks : networks with positive and negative edges (friend/foe, trust/distrust)
+ Location-based online social networks : Social networks with geographic check-ins
+ Wikipedia networks and metadata : Talk, editing and voting data from Wikipedia
+ Twitter and Memetracker : Memetracker phrases, links and 467 million Tweets
+ Online communities : Data from online communities such as Reddit and Flickr
+ Online reviews : Data from online review systems such as BeerAdvocate and Amazon