Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/melifluos/LSH-community-detection
community detection for the whole Twitter graph on a single laptop
https://github.com/melifluos/LSH-community-detection
Last synced: 7 days ago
JSON representation
community detection for the whole Twitter graph on a single laptop
- Host: GitHub
- URL: https://github.com/melifluos/LSH-community-detection
- Owner: melifluos
- Created: 2017-04-19T08:22:22.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2017-11-21T15:31:17.000Z (almost 7 years ago)
- Last Synced: 2024-08-01T13:37:37.599Z (3 months ago)
- Language: Python
- Size: 351 KB
- Stars: 21
- Watchers: 6
- Forks: 6
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LSH-community-detection
This is the code for our paper 'Real-Time Community Detection in Large Social Networks on a Laptop' https://arxiv.org/abs/1601.03958 Community detection for large networks on a single laptop. We use minhash signatures to encode the Jaccard similarity between neighbourhood graphs of vertices in social networks. A Locality Sensitive Hash table is built on top of the minhashes to perform extremely fast nearest neighbour search. The results of the nearest neighbour search are ranked and structured using the WALKTRAP community detection algorithm.
### Prerequisites
The code uses the numpy, pandas and scikit-learn python packages. We recommend installing these through Anaconda. Generating minhashes requires the mmh3 package.
pip install mmh3
We provide binaries of the cython code. If you wish to alter the cython code you will need to install cython
pip install cython
## Replicating the experiments with Twitter data
Download the minhash data available from DANS EASY:
https://doi.org/10.17026/dans-x6a-mgvm
Assuming you are in the directory of the source code and have cloned this repository.
To build the LSH table
python LSH.py minhash_data_path LSH_output_path
To generate metrics for the ground truth communities
python assess_community_quality.py minhash_data_path outpath
To run experimentation
python run_experiments.py minhash_data_path LSH_outputpath outpath
## Replicating the end-to-end process with the public email data set from SNAP https://snap.stanford.edu/data/email-EuAll.html
python run_email_data.py
This will generate minhashes from the raw data and use them to build an LSH table. From the LSH table all of the results shown in the paper are generated.
The LSH table and the minhashes are written to the resources folder. The plots are written to the results folder.
## Authors
**Ben Chamberlain**
### Citation
If you make use of this code please cite:
Chamberlain BP, Levy-Kramer J, Humby C, Deisenroth MP. Real-Time Community Detection in Large Social Networks on a Laptop. arXiv preprint arXiv:1601.03958. 2016 Jan 15.