https://github.com/adriacabeza/graphclustering

:milky_way: Method to partition large networks into communities
https://github.com/adriacabeza/graphclustering

clustering-methods graphs large-network python3 spectral-clustering

Last synced: 4 months ago
JSON representation

:milky_way: Method to partition large networks into communities

Host: GitHub
URL: https://github.com/adriacabeza/graphclustering
Owner: adriacabeza
Created: 2019-11-08T17:04:47.000Z (over 5 years ago)
Default Branch: master
Last Pushed: 2023-10-03T21:39:03.000Z (over 1 year ago)
Last Synced: 2025-01-11T03:41:52.295Z (5 months ago)
Topics: clustering-methods, graphs, large-network, python3, spectral-clustering
Language: TeX
Homepage:
Size: 121 MB
Stars: 2
Watchers: 3
Forks: 0
Open Issues: 4
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        
:milky_way: Graph Clustering into communities 


[![HitCount](http://hits.dwyl.io/adriacabeza/object-cut.svg)](http://hits.dwyl.io/AlbertSuarez/GraphClustering)

[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/adriacabeza/GraphClustering)

[![made-with-python](https://img.shields.io/badge/Made%20with-Python-1f425f.svg)](https://www.python.org/)

[![GitHub stars](https://img.shields.io/github/stars/adriacabeza/GraphClustering.svg)](https://GitHub.com/adriacabeza/GraphClustering/stargazers/)

We will be using the following graphs from the Stanford Network Analysis Project (SNAP): ca-GrQc, Oregon-1, roadNet-CA, soc-Epinions1, and web-NotreDame (http://snap.stanford.edu/data/index.html). Project description in [*project.pdf*](./project.pdf) and final report in [*report.pdf*](./report/report.pdf). 

## Initial example visualization and clustering of the graph ca-GrQc



  

   



 Kamada-Kawai graph visualization of the ca-GrQc graph and Clustering using the Spectral Embedding. 


## Statistics of graph datasets

| Graph         | #vertices | #edges  | #clusters |

|---------------|-----------|---------|-----------|

| ca-GrQc       | 4158      | 13428   | 2         |

| Oregon-1      | 10670     | 22002   | 5         |

| soc-Epinions1 | 75877     | 405739  | 10        |

| web-NotreDame | 325729    | 1117563 | 20        |

| roadNet-CA    | 1957027   | 2760388 | 50        |

 

## Run it

### Requirements

Python 3 and install dependencies:

```bash

pip install -r requirements.txt

```

### Recommendations

Usage of [virtualenv](https://realpython.com/blog/python/python-virtual-environments-a-primer/) is recommended for package library / runtime isolation.

### Usage

Run the clustering algorithm from the main Python file *graph_clustering.py*. You can read arguments help and find command examples in *EXPERIMENTS.sh*. List of arguments:

- *seed*: Random seed.

- *iterations*: Number of iterations with different seed.

- *file*: Path of the input graph file.

- *outputs_path*: Path to save the outputs.

- *clustering*: Use "kmeans", "custom_kmeans", "kmeans_sklearn", "xmeans" or "agglomerative".

- *random_centroids*: Random centroids initialization for "custom_kmeans".

- *distance_metric*: Distance metric for "custom_kmeans": "MINKOWSKI", "CHEBYSHEV", "EUCLIDEAN".

- *compute_eig*: Compute eigenvectors or load them.

- *k*: Number of desired clusters.

- *networkx*: Use networkx library for Laplacian.

- *eig_kept*: Number of eigen vectors kept.

- *normalize_laplacian*: Normalize Laplacian.

- *invert_laplacian*: Invert Laplacian.

- *second*: Using only second smallest eigenvector.

- *eig_normalization*: Normalization of eigen vectors by "vertex", "eig" or "None".

## Authors

👤 Álvaro Orgaz Expósito ([alvarorgaz](https://github.com/alvarorgaz))

👤 Adrià Cabeza ([adriacabeza](https://github.com/adriacabeza))

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/adriacabeza/graphclustering

Awesome Lists containing this project

README

:milky_way: Graph Clustering into communities